Elusion Random Sample Test Ordered Under Rule 26(g) in a Keyword Search Based Discovery Plan

August 26, 2018

There is a new case out of Chicago that advances the jurisprudence of my sub-specialty, Legal Search. City of Rockford v. Mallinckrodt ARD Inc., 2018 WL 3766673, Case 3:17-cv-50107 (N.D. Ill., Aug. 7, 2018). This discovery order was written by U.S. Magistrate Judge Iain Johnston who entitled it: “Order Establishing Production Protocol for Electronically Stored Information.” The opinion is both advanced and humorous, destined to be an oft-cited favorite for many. Thank you Judge Johnston.

In City of Rockford an Elusion random sample quality assurance test was required as part of the parties discovery plan to meet the reasonable efforts requirements of Rule 26(g). The random sample procedure proposed was found to impose only a proportional, reasonable burden under Rule 26(b)(1). What makes this holding particularly interesting is that an Elusion test is commonly employed in predictive coding projects, but here the parties had agreed to a keyword search based discovery plan. Also see: Tara Emory, PMP, Court Holds that Math Matters for eDiscovery Keyword Search,  Urges Lawyers to Abandon their Fear of Technology (Driven, (August 16, 2018) (“party using keywords was required to test the search effectiveness by sampling the set of documents that did not contain the keywords.”)

The Known Unknowns and Unknown Unknowns

Judge Johnston begins his order in City of Rockford with a famous quote by Donald Rumseld, a two-time Secretary of Defense.

“[A]s we know there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. . .”
Donald Rumseld

For those not familiar with this famous Known Knowns quip, here is a video of the original:

Here the knowledge logic is spelled out in a chart, since I know we all love that sort of thing. Deconstructing Rumsfeld: Knowledge and Ignorance in the Age of Innovation (Inovo 5/114).

Anybody who does complex investigations is familiar with this problem. Indeed, you can argue this insight is fundamental to all of science and experimental method. Logan, David C. (March 1, 2009). “Known knowns, known unknowns, unknown unknowns and the propagation of scientific enquiry”, Journal of Experimental Botany 60 (3). pp. 712–4. [I have always wanted to quote a botany journal.]

How do you deal with the known unknowns and the unknown unknowns, the information that we don’t even know that we don’t know about? The deep, hidden information that is both obtuse and rare. Information that is hard to retrieve and harder still to prove does not exist at all. Are you chasing something that might not exist? Something unknown because nonexistent? Such as an overlooked Highly Relevant document? (The stuff of nightmares!) Are you searching for nothing? Zero? If you find it, what does that mean? What can be known and what can never be known? Scientists, investigators and the Secretary of Defense alike all have to ponder these questions and all want to use the best tools and best people possible to do so. See: Deconstructing Rumsfeld: Knowledge and Ignorance in the Age of Innovation (Inovo 5/114).

Seeking Knowledge of the Unknown Elusion Error Rate

These big questions, though interesting, are not why Judge Johnston started his opinion with the Rumseld quote. Instead, he used the quote to emphasize that new e-discovery methods, namely random sampling and statistical analysis, can empower lawyers to know what they never did before. A technical way to know the known unknowns. For instance, a way to know the number of relevant documents that will be missed and not produced: the documents that elude retrieval.

As the opinion and this blog will explain, you can do that, know that, by using an Elusion random sample of the null-set. The statistical analysis of the sample transforms the unknown quantity to a known (subject to statistical probabilities and range). It allows lawyers to know, at least within a range, the number of relevant documents that have not been found. This is a very useful quality assurance method that relies on objective measurements to demonstrate success of your project, which here is information retrieval. This and other random sampling methods allow for the calculation of Recall, meaning the percent of total relevant documents found. This is another math-based, quality assurance tool in the field of information retrieval.

One of the main points Judge Johnston makes in his order is that lawyers should embrace this kind of technical knowledge, not shy away from it. As Tara Emory said in her article, Court Holds that Math Matters for eDiscovery Keyword Search:

A producing party must determine that its search process was reasonable. In many cases, the best way to do this is with objective metrics. Producing parties often put significant effort into brainstorming keywords, interviewing witnesses to determine additional terms, negotiating terms with the other party, and testing the documents containing their keywords to eliminate false positives. However, these efforts often still fail to identify documents if important keywords were missed, and sampling the null set is a simple, reasonable way to test whether additional keywords are needed. …

It is important to overcome the fear of technology and its related jargon, which can help counsel demonstrate the reasonableness of search and production process. As Judge Johnston explains, sampling the null set is a process to determine “the known unknown,” which “is the number of the documents that will be missed and not produced.” Judge Johnson disagreed with the defendants’ argument “that searching the null set would be costly and burdensome.” The Order requires Defendants to sample their null set at a 95% +/-2% margin of error (which, even for a very large set of documents, would be about 2,400 documents to review).[4] By taking these measures—either with TAR or with search terms, counsel can more appropriately represent that they have undertaken a “reasonable inquiry” for relevant information within the meaning of FRCP 26(g)(1).

Small Discovery Dispute in an Ocean of Cooperation

Judge Johnston was not asked to solve the deep mysteries of knowing and not knowing in City of Rockford. The parties came to him instead with an interesting, esoteric discovery dispute. They had agreed on a great number of things, for which the court profusely congratulated them.

The attorneys are commended for this cooperation, and their clients should appreciate their efforts in this regard. The Court certainly does. The litigation so far is a solid example that zealous advocacy is not necessarily incompatible with cooperation. The current issue before the Court is an example of that advocacy and cooperation. The parties have worked to develop a protocol for the production of ESI in this case, but have now reached an impasse as to one aspect of the protocol.

The parties disagreed on whether to include a document review quality assurance test in the protocol. The Plaintiffs wanted one and the Defendants did not. Too burdensome they said.

To be specific, the Plaintiffs wanted a test where the efficacy of any parties production would be tested by use of an Elusion type of Random Sample of the documents not produced. The Defendants opposed any specific test. Instead, they wanted the discovery protocol to say that if the receiving party had concerns about the adequacy of the producing party’s efforts, then they would have a conference to address the concerns.

Judge Johnston ruled for the plaintiff in this dispute and ordered a  random elusion sample to be taken after the defendant stopped work and completed production. In this case it was a good decision, but should not be routinely required in all matters.

The Stop Decision and Elusion Sample

One of the fundamental problems in any investigation is to know when you should stop the investigation because it is no longer worth the effort to carry on. When has a reasonable effort been completed? Ideally this happens after all of the important documents have already been found. At that point you should stop the effort and move on to a new project. Alternatively, perhaps you should keep on going and look for more? Should you stop or not?

In Legal Search we all this the “Stop Decision.” Should you conclude the investigation or continue further AI training rounds and other search. As explained in the e-Discovery Team TAR Course:

The all important stop decision is a legal, statistical decision requiring a holistic approach, including metrics, sampling and over-all project assessment.You decide to stop the review after weighing a multitude of considerations. Then you test your decision with a random sample in Step Seven.

See: TAR Course: 15th Class – Step Seven – ZEN Quality Assurance Tests.

If you want to go deeper into this, then listen in on this TAR Course lecture on the Stop decision.

____________

Once a decision is made to Stop, then a well managed document review project will use different tools and metrics to verify that the Stop decision was correct. Judge Johnston in City of Rockford used one of my favorite tools, the Elusion random sample that I teach in the e-Discovery Team TAR Course. This type of random sample is called an Elusion sample.

Judge Johnston ordered an Elusion type random sample of the null set in City of Rockford. The sample would determine the range of relevant documents that likely eluded you. These are called False Negatives. Documents presumed Irrelevant and withheld that were in fact Relevant and should have been produced. The Elusion sample is designed to give you information on the total number of Relevant documents that were likely missed, unretrieved, unreviewed and not produced or logged. The fewer the number of False Negatives the better the Recall of True Positives. The goal is to find, to retrieve, all of the Relevant ESI in the collection.

Another way to say the same thing is to say that the goal is Zero False Negatives. You do not miss a single relevant file. Every file designated Irrelevant is in fact not relevant. They are all True Negatives. That would be Total Recall: “the Truth, the Whole Truth …” But that is very rare and some error, some False Negatives, are expected in every large information retrieval project. Some relevant documents will almost always be missed, so the goal is to make the False Negatives inconsequential and keep the Elusion rate low.

Here is how Judge Iain Johnston explained the random sample:

Plaintiffs propose a random sample of the null set. (The “null set” is the set of documents that are not returned as responsive by a search process, or that are identified as not relevant by a review process. See Maura R. Grossman & Gordon v. Cormack, The Grossman-Cormack Glossary of Technology-Assisted Review, 7 Fed. Cts. L. Rev. 1, 25 (2013). The null set can be used to determine “elusion,” which is the fraction of documents identified as non-relevant by a search or review effort that are, in fact, relevant. Elusion is estimated by taking a random sample of the null set and determining how many or what portion of documents are actually relevant. Id. at 15.) FN 2

Judge Johnston’s Footnote Two is interesting for two reasons. One, it attempts to calm lawyers who freak out when hearing anything having to do with math or statistics, much less information science and technology. Two, it does so with a reference to Fizbo the clown.

The Court pauses here for a moment to calm down litigators less familiar with ESI. (You know who you are.) In life, there are many things to be scared of, including, but not limited to, spiders, sharks, and clowns – definitely clowns , even Fizbo. ESI is not something to be scared of. The same is true for all the terms and jargon related to ESI. … So don’t freak out.

Accept on Zero Error for Hot Documents

Although this is not addressed in the court order, in my personal view, no False Negatives, iw – overlooked  documents – are acceptable when it comes to Highly Relevant documents. If even one document like that is found in the sample, one Highly Relevant Document, then the Elusion test has failed in my view. You must conclude that the Stop decision was wrong and training and document review must recommence. That is called an Accept on Zero Error test for any hot documents found. Of course my personal views on best practice here assume the use of AI ranking, and the parties in City of Rockford only used keyword search. Apparently they were not doing machine training at all.

The odds of finding False Negatives, assuming that only a few exist (very low prevalence) and the database is large, are very unlikely in a modest sized random sample. With very low prevalence of relevant ESI the test can be of limited effectiveness. That is an inherent problem with low prevalence and random sampling. That is why statistics have only limited effectiveness and should be considered part of a total quality control program. See Zero Error Numerics: ZEN. Math matters, but so too does good project management and communications.

The inherent problem with random sampling is that the only way to reduce the error interval is to increase the size of the sample. For instance, to decrease the margin of error to only 2% either way, a total error of 4%, a random sample size of around 2,400 documents is needed. Even though that narrows the error rate to 4%, there is still another error factor of the Confidence Level, here at 95%. Still, it is not worth the effort to review even more sample documents to reduce that to a 99% Level.

Random sampling has limitations in low prevalence datasets, which is typical in e-discovery, but still sampling can be very useful. Due to this rarity issue, and the care that producing parties always take to attain high Recall, any documents found in an Elusion random sample should be carefully studied to see if they are of any significance. We look very carefully at any new documents found that are of a kind not seen before. That is unusual. Typically  any relevant documents found by random sample of the elusion set are of a type that have been seen before, often many, many times before. These “same old, same old” type of documents are of no importance to the investigation at this point.

Most email related datasets are filled with duplicative, low value data. It is not exactly irrelevant noise, but it is not a helpful signal either. We do not care if we  get all of that kind of merely relevant data. What we really want are the Hot Docs, the high value Highly Relevant ESI, or at least Relevant and of a kind not seen before. That is why the Accept On Zero Error test is so important for Highly Relevant documents.

The Elusion Test in City of Rockford 

In City of Rockford Judge Johnston considered a discovery stipulation where the parties had agreed to use a typical keyword search protocol, but disagreed on a quality assurance protocol. Judge Johnston held:

With key word searching (as with any retrieval process), without doubt, relevant documents will be produced, and without doubt, some relevant documents will be missed and not produced. That is a known known. The known unknown is the number of the documents that will be missed and not produced.

Back to the False Negatives again, the known unknown. Judge Johnston continues his analysis:

But there is a process by which to determine that answer, thereby making the known unknown a known known. That process is to randomly sample the nullset. Karl Schieneman & Thomas C. Gricks III, The Implications of Rule26(g) on the Use of Technology-Assisted Review, 2013 Fed. Cts. L. Rev. 239, 273 (2013)(“[S]ampling the null set will establish the number of relevant documents that are not being produced.”). Consequently, the question becomes whether sampling the null set is a reasonable inquiry under Rule 26(g) and proportional to the needs of this case under Rule 26(b)(1).

Rule 26(g) Certification
Judge Johnston takes an expansive view of the duties placed on counsel of record by Rule 26(g), but concedes that perfection is not required:

Federal Rule of Civil Procedure 26(g) requires all discovery requests be signed by at least one attorney (or party, if proceeding pro se). Fed. R. Civ. P. 26(g)(1). By signing the response, the attorney is certifying that to the best of counsel’s knowledge, information, and belief formed after a reasonable inquiry, the disclosure is complete and correct at the time it was made. Fed. R. Civ. P. 26(g)(1)(A). But disclosure of documents need not be perfect. … If the Federal Rules of Civil Procedure were previously only translucent on this point, it should now be clear with the renewed emphasis on proportionality.

Judge Johnston concludes that Rule 26(g) on certification applies to require the Elusion sample in this case.

Just as it is used in TAR, a random sample of the null set provides validation and quality assurance of the document production when performing key word searches.  Magistrate Judge Andrew Peck made this point nearly a decade ago. See William A. Gross Constr. Assocs., 256 F.R.D. at 135-6 (citing Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251, 262 (D. Md. 2008)); In re Seroquel Products Liability Litig., 244 F.R.D. 650, 662 (M.D. Fla. 2007) (requiring quality assurance).

Accordingly, because a random sample of the null set will help validate the document production in this case, the process is reasonable under Rule 26(g).

Rule 26(b)(1) Proportionality

Judge Johnston considered as a separate issue whether it was proportionate under Rule 26(b)(1) to require the elusion test requested. Again, the court found that it was in this large case on the pricing of prescription medication and held that it was proportional:

The Court’s experience and understanding is that a random sample of the null set will not be unreasonably expensive or burdensome. Moreover and critically, Defendants have failed to provide any evidence to support their contention. Mckinney/Pearl Rest. Partners, L.P. v. Metro. Life Ins. Co., 322 F.R.D. 235, 242 (N.D.Tex. 2016) (party required to submit affidavits or offer evidence revealing the nature of the burden)
Once again we see a party seeking protection from having to do something because it is so burdensome then failing to present actual evidence of burden. We see this a lot lately. Responding Party’s Complaints of Financial Burden of Document Review Were Unsupported by the Evidence, Any Evidence (e-Discovery Team, 8/5/18);

Judge Johnston concludes his “Order Establishing Production Protocol for Electronically Stored Information” with the following:

The Court adopts the parties’ proposed order establishing the production protocol for ESI with the inclusion of Plaintiffs’ proposal that a random sample of the null set will occur after the production and that any responsive documents found as a result of that process will be produced. Moreover, following that production, the parties should discuss what additional actions, if any, should occur. If the parties cannot agree at that point, they can raise the issue with the Court.

Conclusion

City of Rockford is important because it is the first case to hold that a quality control procedure should be used to meet the reasonable efforts certification requirements of Rule 26(g). The procedure here required was a random sample Elusion test with related, limited data sharing. If this interpretation of Rule 26(g) is followed by other courts, then it could have a big impact on legal search jurisprudence. Tara Emory in her article, Court Holds that Math Matters for eDiscovery Keyword Search goes so far as to conclude that City of Rockford stands for the proposition that “the testing and sampling process associated with search terms is essential for establishing the reasonableness of a search under FRCP 26(g).”

The City of Rockford holding could persuade other judges and encourage courts to be more active and impose specific document review procedures on all parties, including requiring the use of sampling and artificial intelligence. The producing party cannot always have a  free pass under Sedona Principle Six. Testing and sampling may well be routinely ordered in all “large” document review cases in the future.

It will be very interesting to watch how other attorneys argue City of Rockford. It will continue a line of cases examining methodology and procedures in document review. See eg., William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Co., 256 F.R.D. 134 (S.D.N.Y. 2009) (“wake-up call” for lawyers on keyword search); Winfield v. City of New York (SDNY, Nov. 27, 2017), where Judge Andrew Peck considers methodologies and quality controls of the active machine learning process. Also see Special Master Maura Grossman’s Order Regarding Search Methodology for ESI, a validation Protocol for the Broiler Chicken antitrust cases.

The validation procedure of an Elusion sample in City of Rockford is just one of many possible review protocols that a court could impose under Rule 26(g). There are dozens more, including whether predictive coding should be required. So far, courts have been reluctant to order that, as Judge Peck explained in Hyles:

There may come a time when TAR is so widely used that it might be unreasonable for a party to decline to use TAR. We are not there yet.

Hyles v. New York City, No. 10 Civ. 3119 (AT)(AJP), 2016 WL 4077114 (S.D.N.Y. Aug. 1, 2016):

Like a kid in the backseat of the car, I cannot help but ask, are we there yet? Hyles was published over two years ago now. Maybe some court, somewhere in the world, has already ordered a party to do predictive coding against their will, but not to our knowledge. That is a known unknown. Still, we are closer to “There” with the City of Rockford’s requirement of an Elusion test.

When we get “there,” and TAR is finally ordered in a case, it will probably arise in a situation like City of Rockford where a joint protocol applicable to all parties is involved. That is easier to sell than a one-sided protocol. The court is likely to justify the order by Rule 26(g), and hold that it requires all parties in the case to use predictive coding. Otherwise, they will not meet the  reasonable effort burdens of Rule 26(g). Other rules will be cited too, of course, including Rule 1, but Rule 26(g) is likley to be key.

____________

___

 

____

 

 


“Save Everything” and Eventually You Will Not Be Able to Find Anything: The Sedona Conference Principles and Commentary on Defensible Disposition

August 13, 2018

If you are a data hoarder, an information pack-rat that saves everything, you will eventually drown in your own data and die. Maybe not literally killed, mind you, but figuratively. Maybe not you personally, but your enterprise, your group, your project, your network. Too much information can render you and your enterprise intellectually paralyzed, cut off and seriously misinformed or uninformed. Saving it all is physically and logistically difficult, if not possible. Even if you could, keeping it all would impede your search, making it hard to find the information you need, when you need it. I address these issues this week in my review of a new commentary by The Sedona Conference Principles and Commentary on Defensible Disposition (August 2018).

Information overload is better than physical death I know, but still very bad in today’s Google world. You end up not being able to find the information you need, when you need it. That makes it hard to determine what really happened. It allows lies and liars to fester and grow. We are now seeing firsthand in the U.S. where this can lead. It is not good. It has put the whole world into a precarious situation. We need the truth to thrive as a culture; not smoke and mirrors, not conman games. A culture built on lies is a cancer. It is a deadly disease, especially for the Law, which depends on truth, on evidence, on real facts, to attain the goal of Justice.

Saving Too Much

Over-retention is the enemy of effective, efficient search. The more ESI there is to search, the more difficult the search. There can be exceptions to this rule, but for the most part it is true. That makes a “save everything” ESI policy an enemy of search. It interferes with the ability to find the information needed, which in my case is electronic evidence in legal proceedings, when it is needed. It is important for these information needs be filled quickly and completely.

Search is powerful. That is my field. The more data the better, is often true, but not always. It depends on the data and its effective life, how long a particular type of data is of any use to anyone. Big data allows for detection of patterns that would otherwise not be seen. This analysis takes CPU power. The advances in this area have been fantastic. We have the processing power, as well as the cheap storage, but our search and retrieval software has not otherwise kept up with the data explosion in volume and complexity. Predictive coding software and other AI applications have come a long way, but are still sometimes confused by the volume, variety and complexity of useless data that plagues most company IT systems.

Retrieval of specific documents and metadata takes time and specialized human skills. The more worthless data in a collection, such as spam, the greater the number false positives in a search, no matter how powerful the algorithms or skilled the searcher. Vast volumes of data make searches longer to execute and less precise. The more noise in the data, the more difficult to hear the signal. That is a fundamental law of information.

With high data volumes you can often still find the signal, the relevant documents that you need in large chaotic data collections, but it takes time and special tools and skills. There are often too many false positives in searches of data collections containing too much spam-like, useless data. Although search is strong, search alone is inadequate to meet the needs of most organizations. They also need data destruction and retention policies that govern all information. That is one reason why the success of information governance depends on data disposition.

An organization should save as much as it needs, but not too much, and also not too little. It is a Goldilocks situation. If you do not save data, you can never find it. If you save too little, then what you later need might not be there to be found. But if you save too much, you may never be able to find what you need. The signal may be in the collection to be found, in plain view, but hidden in the vast numbers, the noise of spam and other irrelevancies.

Search v. Destroy

I have debated Information Governance leaders for years the importance of search versus file destruction. I was pretty much the only advocate for search over disposition. I favored retention over destruction in most close cases, but I had a cost and proportionality overlay. I am reminded, for instance, of my debate with Jason Baron on the subject at the IQPC 10th Anniversary of Information Governance and eDiscovery, where he managed to quote Churchill at the end and won the debate hands-down. e-Disco News, Knowledge and Humor: What’s Happening Today and Likely to Happen Tomorrow (e-Discovery Team, June 7, 2015); Information Governance v Search: The Battle Lines Are Redrawn (e-Discovery Team, Feb. 8, 2015).

I did not consider it a fair debate because of Jason’s very successful pandering to the jury during his closing argument with a quote by Churchill from his speech, We Shall Fight on the Beaches. That’s the one about never surrendering in the fight against “the odious apparatus of Nazi rule” (sadly, this exhortation still has legs today in the US).

The debate was “unfair” primarily because this was an IG conference. Everybody in IG is pro-destruction and values disposition over search. I think most IG leaders go too far, that they are trigger happy to kill data. I pointed out in my debates that once a file is deleted, it cannot be found, no matter how good your filing, no matter how good your search (forensic recovery issues aside).

I am pro-search and think that the importance of management of ESI by filing and disposition is somewhat overblown. I think search is king, not data deletion. Still, even in my most strident of debates and pro-search arguments, I never advocated for the retention of all data. I always assumed that some file disposition was required and accepted that as a given. I was not a save everything and search advocate. I advocated for both, search and destroy. I advocated for more retention than most, but have never argued to retain everything.

There is a common core of agreement that some ESI should be deleted, that all data should not be saved. The disagreement is on how much data to save. How does a person or company know what is the “just right” data destruction policy for that company? There is agreement among experts that there is no one-size-fits-all solution, so custom work is required. Different retention and destruction policies should apply depending on the company and the particularities of their data universe. Many IG specialists advise clients on the custom fit they need. It involves careful investigation of the company, its data and activities, including law suits and other investigations.

The Sedona Conference  Principles and Commentary on Defensible Disposition

Kevin Brady

Kevin Brady

These IG specialists, and the companies they serve, now have an excellent new resource tool to analyze and custom-fit data destruction policies. The Sedona Conference Principles and Commentary on Defensible Disposition (August 2018 Public Comment Version) (Editors-in-Chief, Kevin F. Brady and Dean Kuckelman). I highly recommend this new and excellent work by The Sedona Conferences. My commendations to the Drafting Team: Lauren A. Allen, Jesse Murray, Ross Gotler, Ken Prine, Logan J. Herlinger, David C. Shonka, Mark Kindy; the Drafting Team Leaders: Tara Emory and Becca Rausch; the Staff Editor: Susan McClaim, and Editors-in-Chief, Kevin F. Brady and Dean Kuckelman. Please send to them any comments you may have.

The Commentary begins in usual Sedona fashion by articulation of basic principles and comments tied to principles. The cases and legal authorities cited in all Commentaries by The Sedona Conference are excellent. This commentary on data disposition is no exception. I commend it for your detailed study and reference. Free download here from The Sedona Conference.

The Principles are:

PRINCIPLE 1.    Absent a legal retention or preservation obligation, organizations may dispose of their information.

Comment 1.a.   An organization should, in the ordinary course of business, properly dispose of information that it does not need.

Comment 1.b.   When designing and implementing an information disposition program, organizations should consider the obligation to preserve information that is relevant to the claims and defenses and proportional to the needs of any pending or anticipated litigation.

Comment 1.c. When designing and implementing an information disposition program, organizations should consider the obligation to preserve information that is relevant to the subject matter of government inquiries or investigations that are pending or threatened against the organization.

Comment 1.d.   When designing and implementing an information disposition program, organizations should consider applicable statutory and regulatory obligations to retain information.

PRINCIPLE 2.    When designing and implementing an information disposition program, organizations should identify and manage the risks of over-retention.

Comment 2.a.   Information has a lifecycle, including a time when disposal is beneficial.

Comment 2.b. To determine the “right” time for disposal, risks and costs of retention and disposal should be evaluated.

PRINCIPLE 3.    Disposition should be based on Information Governance policies that reflect and harmonize with an organization’s information, technological capabilities, and objectives.

Comment 3.a.   To create effective information disposition policies, organizations should establish core components of an Information Governance program, which should reflect what information it has, when it can be disposed of, how it is stored, and who owns it.

Comment 3.b. An organization should understand its technological capabilities and define its information objectives in the context of those capabilities.

Document Disposition and Information Governance

The Sedona Conference Principles and Commentary on Defensible Disposition builds upon Sedona’s earlier work, the Sedona Conference Commentary on Information Governance (Oct. 2014). Principle 6 of the Commentary on Information Governance provides the following guidance to organizations:

The effective, timely, and consistent disposal of physical and electronic information that no longer needs to be retained should be a core component of any Information Governance program. The Sedona Conference, Commentary on Information Governance, 15 SEDONA CONF. J. 125, 146 (2014) (“Information Governance” is “an organization’s coordinated, interdisciplinary approach to satisfying information compliance requirements and managing information risks while optimizing information value.” Id. at 126).

The Comment to Principle 6 goes on to explain:

It is a sound strategic objective of a corporate organization to dispose of information no longer required for compliance, legal hold purposes, or in the ordinary course of business. If there is no legal retention obligation, information should be disposed as soon as the cost and risk of retaining the information is outweighed by the likely business value of retaining the information. . . . Typically, the business value decreases and the cost and risk increase as information ages. Id. at 147.

The Sedona Conference concluded in 2018 that this 2014 advice, and similar advice from other sources, has not been followed by most organizations. instead, they continue to struggle to make “effective disposition decisions.” The group in Principles and Commentary on Defensible Disposition concluded in its Introduction that this struggle was caused by many factors, but identified the three main problems:

[T]he incorrect belief that organizations will be forced to “defend” their disposition
actions if they later become involved in litigation. Indeed, the phrase “defensible disposition” suggests that organizations have a duty to defend their information disposition actions. While it is true that organizations must make “reasonable and good faith efforts to retain information that is relevant to claims or defenses,” that duty to preserve information is not triggered until there is a “reasonably anticipated or pending litigation” or other legal demands for records. The Sedona Principles, Third Edition: Best Practices, Recommendations & Principles for Addressing Electronic Document Production, 19 SEDONA CONF. J. 1, 51, Principle 5, 93 (2018).

Another factor in the struggle toward effective disposition of information is the difficulty in appreciating how such disposition reduces costs and risks.

Lastly, many organizations struggle with how to design and implement effective disposition as part of their overall Information Governance program.

The Principles and Commentary on Defensible Disposition attempt to address these three factors and provide guidance to organizations, and the professionals who counsel organizations, on developing and implementing an effective disposition program.

Disposition Challenges

The Sedona Conference Principles and Commentary on Defensible Disposition (August, 2018) concludes by identifying the main challenges to data deletion.

  1. Unstructured Information.
  2. Mergers and Acquisitions.
  3. Departed, Separated, or Former Employees
  4. Shared File Sites
  5. Personally Identifiable Information (“PII”)
  6. Law Firms, eDiscovery Vendors, and Adversaries
  7. In-House Legal Departments
  8. Hoarders (my personal favorite)
  9. Regulations
  10. Cultural Change and Training

There are more, I am sure, but this is a good top ten list to start. I only wish they had included more discussion of these top ten.

Conclusion

Search is still more important for me than destroy. I prefer Where’s Waldo over Kill Waldo! I have not changed my position on that. But neither has mainstream Information Governance. They still disagree with my emphasis on Search. But everyone agrees that we should do both: Search and Destroy. Even I do not want companies to save all of their data. Some data should be destroyed.

I agree with mainstream IG that saving everything forever is not a viable information governance policy, no matter how many resources you also put into ESI search and retrieval. I have never said that you should rely solely on search, just that you should give Search more importance and, when in doubt, that you should save more documents than less. The Search and Destroy argument has always been one of a matter of degree and balance, not whether there should be no destruction at all. The difficult questions involve what should be saved and for how long, which are traditional information management problems.

Where to draw the line on destruction is the big question for everyone. The answer is always company specific, even project specific. It involves questions of varying retention times, files type and custodian analysis. When it comes down to specific decisions, and close questions, I generally favor retention. What may appear to be useless today, may prove to be relevant evidence tomorrow. I hate not being able to prove my case because all of the documents have already been deleted. Then it is just one person’s word against another. IG experts, who usually no longer litigate, or never litigated, do not like my complaints. They are eager to kill, to purge and destroy data. I am more inclined to save and search, but not save too much. It is a question of balance.

Data destruction – the killing of data – can, if done properly, make the search for relevant content much easier. Some disposition of obviously irrelevant, spam and otherwise useless information makes sense on every level. It helps all users of the IT system. It also helps with legal compliance. Too much destruction of data, too aggressive, and you may end up deleting information that you were required by law to keep. You could lose a law suit because of one mistake in a data disposition decision. Where do you draw the line between save and delete? What is the scope of a preservation duty? What files types should be retained? What retention times should apply? How much is too much? Not enough?

The questions go on and on and there is no one right answer. It all depends on the facts and circumstances of the organization and its data. The new Sedona Conference Principles and Commentary on Defensible Disposition is an important new guide to help IT lawyers and technologists to craft custom answers to these questions.

 


e-Discovery and Poetry on a Rainy Night in Portugal

April 17, 2018

From time to time I like read poetry. Lately it has been the poetry of Billy Collins, a neighbor and famous friend. (He was the Poet Laureate of the United States from 2001 to 2003.) I have been reading his latest book recently, The Rain in Portugal. Billy’s comedic touches balance the heavy parts. Brilliant poet. I selected one poem from this book to write about here, The Five Spot, 1964. It has a couple of obvious e-discovery parallels. It also mentions a musician I had never heard of before, Roland Kirk, who was a genius at musical multi-tasking. Enjoy the poem and videos that follow. There is even a lesson here on e-discovery.

The Five Spot, 1964

There’s always a lesson to be learned
whether in a hotel bar
or over tea in a teahouse,
no matter which way it goes,
for you or against,
what you want to hear or what you don’t.

Seeing Roland Kirk, for example,
with two then three saxophones
in his mouth at once
and a kazoo, no less,
hanging from his neck at the ready.

Even in my youth I saw this
not as a lesson in keeping busy
with one thing or another,
but as a joyous impossible lesson
in how to do it all at once,

pleasing and displeasing yourself
with harmony here and discord there.
But what else did I know
as the waitress lit the candle
on my round table in the dark?
What did I know about anything?

Billy Collins

The famous musician in this poem is Rahsaan Roland Kirk (August 7, 1935[2] – December 5, 1977). Kirk was an American jazz multi-instrumentalist who played tenor saxophone, flute, and many other instruments. He was renowned for his onstage vitality, during which virtuoso improvisation was accompanied by comic banter, political ranting, and, as mentioned, the astounding ability to simultaneously play several musical instruments.

Here is a video of Roland Kirk with his intense multimodal approach to music.

One more Kirk video. What a character.

____

The Law

There are a few statements in Billy Collins’ Five Spot poem that have obvious applications to legal discovery, such as “There’s always a lesson to be learnedno matter which way it goes, for you or against, what you want to hear or what you don’t.” We are all trained to follow the facts, the trails, wherever they may lead, pro or con.

I do not say either pro or con “my case” because it is not. It is my client’s case. Clients pay lawyers for their knowledge, skill and independent advice. Although lawyers like to hear evidence that supports their client’s positions and recollections, after all it makes their job easier, they also want to hear evidence that goes against their client. They want to hear all sides of a story and understand what it means. They look at everything to craft a reasonable story for judge and jury.

Almost all cases have good and bad evidence on both sides. There is usually some merit to each side’s positions. Experienced lawyers look for the truth and present it in the best light favorable for their client. The Rules of Procedure and duties to the court and client require this too.

Bottom line for all e-discovery professionals is that you learn the lessons taught by the parties notes and documents, all of the lessons, good and bad.

The poem calls this a “… joyous impossible lesson in how to do it all at once, pleasing and displeasing yourself with harmony here and discord there.” All lawyers know this place, this joyless lesson of discovering the holes in your client’s case. As far as the “doing it all at once ” phrase, this too is very familiar to any e-discovery professional. If it is done right, at the beginning of a case, the activity is fast and furious. Kind of like a Roland Kirk solo, but without Roland’s exuberance.

Everybody knows that the many tasks of e-discovery must be done quickly and pretty much all at once at the beginning of a case: preservation notices, witness interviews, ESI collection, processing and review. The list goes on and on. Yet, in spite of this knowledge, most everyone still treats e-discovery as if they had bags of time to do it. Which brings me to another Billy Collins poem that I like:

BAGS OF TIME

When the keeper of the inn
where we stayed in the Outer Hebrides
said we had bags of time to catch the ferry,
which we would reach by traversing the causeway
between this island and the one to the north,

I started wondering what a bag of time
might look like and how much one could hold.
Apparently, more than enough time for me
to wonder about such things,
I heard someone shouting from the back of my head.

Then the ferry arrived, silent across the water,
at the Lochmaddy Ferry Terminal,
and I was still thinking about the bags of time
as I inched the car clanging onto the slipway
then down into the hold for the vehicles.

Yet it wasn’t until I stood at the railing
of the upper deck with a view of the harbor
that I decided that a bag of time
should be the same color as the pale blue
hull of the lone sailboat anchored there.

And then we were in motion, drawing back
from the pier and turning toward the sea
as ferries had done for many bags of time,
I gathered from talking to an old deckhand,
who was decked out in a neon yellow safety vest,

and usually on schedule, he added,
unless the weather has something to say about it.

Conclusion

Take time out to relax and let yourself ponder the works of a poet. We have bags of time in our life for that. Poetry is liable to make you a better person and a better lawyer.

I leave you with two videos of poetry readings by Billy Collins, the first at the Obama White House. He is by far my favorite contemporary poet. Look for some of his poems on dogs and cats. They are especially good for any pet lovers like me.

One More Billy Collins video.

 


TAR Course Expands Again: Standardized Best Practice for Technology Assisted Review

February 11, 2018

The TAR Course has a new class, the Seventeenth Class: Another “Player’s View” of the Workflow. Several other parts of the Course have been updated and edited. It now has Eighteen Classes (listed at end). The TAR Course is free and follows the Open Source tradition. We freely disclose the method for electronic document review that uses the latest technology tools for search and quality controls. These technologies and methods empower attorneys to find the evidence needed for all text-based investigations. The TAR Course shares the state of the art for using AI to enhance electronic document review.

The key is to know how to use the document review search tools that are now available to find the targeted information. We have been working on various methods of use since our case before Judge Andrew Peck in Da Silva Moore in 2012. After we helped get the first judicial approval of predictive coding in Da Silva, we began a series of several hundred document reviews, both in legal practice and scientific experiments. We have now refined our method many times to attain optimal efficiency and effectiveness. We call our latest method Hybrid Multimodal IST Predictive Coding 4.0.

The Hybrid Multimodal method taught by the TARcourse.com combines law and technology. Successful completion of the TAR course requires knowledge of both fields. In the technology field active machine learning is the most important technology to understand, especially the intricacies of training selection, such as Intelligently Spaced Training (“IST”). In the legal field the proportionality doctrine is key to the  pragmatic application of the method taught at TAR Course. We give-away the information on the methods, we open-source it through this publication.

All we can transmit by online teaching is information, and a small bit of knowledge. Knowing the Information in the TAR Course is a necessary prerequisite for real knowledge of Hybrid Multimodal IST Predictive Coding 4.0. Knowledge, as opposed to Information, is taught the same way as advanced trial practice, by second chairing a number of trials. This kind of instruction is the one with real value, the one that completes a doc review project at the same time it completes training. We charge for document review and throw in the training. Information on the latest methods of document review is inherently free, but Knowledge of how to use these methods is a pay to learn process.

The Open Sourced Predictive Coding 4.0 method is applied for particular applications and search projects. There are always some customization and modifications to the default standards to meet the project requirements. All variations are documented and can be fully explained and justified. This is a process where the clients learn by doing and following along with Losey’s work.

What he has learned through a lifetime of teaching and studying Law and Technology is that real Knowledge can never be gained by reading or listening to presentations. Knowledge can only be gained by working with other people in real-time (or near-time), in this case, to carry out multiple electronic document reviews. The transmission of knowledge comes from the Q&A ESI Communications process. It comes from doing. When we lead a project, we help students to go from mere Information about the methods to real Knowledge of how it works. For instance, we do not just make the Stop decision, we also explain the decision. We share our work-product.

Knowledge comes from observing the application of the legal search methods in a variety of different review projects. Eventually some Wisdom may arise, especially as you recover from errors. For background on this triad, see Examining the 12 Predictions Made in 2015 in “Information → Knowledge → Wisdom” (2017). Once Wisdom arises some of the sayings in the TAR Course may start to make sense, such as our favorite “Relevant Is Irrelevant.” Until this koan is understood, the legal doctrine of Proportionality can be an overly complex weave.

The TAR Course is now composed of eighteen classes:

  1. First Class: Background and History of Predictive Coding
  2. Second Class: Introduction to the Course
  3. Third Class:  TREC Total Recall Track, 2015 and 2016
  4. Fourth Class: Introduction to the Nine Insights from TREC Research Concerning the Use of Predictive Coding in Legal Document Review
  5. Fifth Class: 1st of the Nine Insights – Active Machine Learning
  6. Sixth Class: 2nd Insight – Balanced Hybrid and Intelligently Spaced Training (IST)
  7. Seventh Class: 3rd and 4th Insights – Concept and Similarity Searches
  8. Eighth Class: 5th and 6th Insights – Keyword and Linear Review
  9. Ninth Class: 7th, 8th and 9th Insights – SME, Method, Software; the Three Pillars of Quality Control
  10. Tenth Class: Introduction to the Eight-Step Work Flow
  11. Eleventh Class: Step One – ESI Communications
  12. Twelfth Class: Step Two – Multimodal ECA
  13. Thirteenth Class: Step Three – Random Prevalence
  14. Fourteenth Class: Steps Four, Five and Six – Iterative Machine Training
  15. Fifteenth Class: Step Seven – ZEN Quality Assurance Tests (Zero Error Numerics)
  16. Sixteenth Class: Step Eight – Phased Production
  17. Seventeenth Class: Another “Player’s View” of the Workflow (class added 2018)
  18. Eighteenth Class: Conclusion

With a lot of hard work you can complete this online training program in a long weekend, but most people take a few weeks. After that, this course can serve as a solid reference to consult during complex document review projects. It can also serve as a launchpad for real Knowledge and eventually some Wisdom into electronic document review. TARcourse.com is designed to provide you with the Information needed to start this path to AI enhanced evidence detection and production.

 


%d bloggers like this: