Announcing the e-Discovery Team’s TAR Training Program: 16 Classes, All Online, All Free – The TAR Course

March 19, 2017

We launch today a sixteen class online training program on Predictive Coding: the e-Discovery Team TAR Course. This is a “how to” course on predictive coding. We have a long descriptive name for our method, Hybrid Multimodal IST Predictive Coding 4.0. By the end of the course you will know exactly what that means. You will also understand the seventeen key things you need to know to do predictive coding properly, shown this diagram.


Hands-on
 hacking of predictive coding document reviews has been my obsession since Da Silva went viral. Da Silva Moore v. Publicis Groupe & MSL Group, 27 F.R.D. 182 (S.D.N.Y. 2012). That is the case where I threw Judge Peck the softball opportunity to approve predictive coding for the first time. See: Judge Peck Calls Upon Lawyers to Use Artificial Intelligence and Jason Baron Warns of a Dark Future of Information Burn-Out If We Don’t

Alas, because of my involvement in Da Silva I could never write about it, but I can tell you that none of the thousands of commentaries on the case have told the whole nasty story, including the outrageous “alternate fact” attacks by plaintiff’s counsel on Judge Andrew Peck and me. I guess I should just take the failed attempts to knock me and the Judge out of the case as flattery, but it still leaves a bad taste in my mouth. A good judge like Andy Peck did not deserve that kind of treatment. 

At the time of Da Silva, 2012, my knowledge of predictive coding was mostly theoretical, informational. But now, after “stepping-in” for five years to actually make the new software work, it is practical. For what “stepping-in” means see the excellent book on artificial intelligence and future employment by Professor Thomas Davenport and Julia Kirby, titled Only Humans Need Apply (HarperBusiness, 2016). Also see: Dean Gonsowski, A Clear View or a Short Distance? AI and the Legal Industry, and, Gonsowski, A Changing World: Ralph Losey on “Stepping In” for e-Discovery (Relativity Blog). 

If you are looking to craft a speciality in the law that rides the new wave of AI innovations, then electronic document review with TAR is a good place to start. See Part Two of my January 22, 2017 blog, Lawyers’ Job Security in a Near Future World of AI. This is where the money will be.

 

Our TAR Course is designed to teach this practical, stepping-in based knowledge. The link to the course will always be shown on this blog at the top of the page. The TAR page next to it has related information.

Since Da Silva we have learned a lot about the actual methods of predictive coding. This is hands-on learning through actual cases and experiments, including sixty-four test runs at TREC in 2015 and 2016.

We have come to understand very well the technical details, the ins and outs of legal document review enhanced by artificial intelligence, AI-enhanced review. That is what TAR and predictive coding really mean, the use of active machine learning, a type of specialized artificial intelligence, to find the key documents needed in an investigation. In the process I have written over sixty articles on the subject of TAR, predictive coding and document review, most of them focused on what we have learned about methods.

The TAR Course is the first time we have put all of this information together in a systematic training program. In sixteen classes we cover all seventeen topics, and much more. The result is an online instruction program that can be completed in one long weekend. After that it can serve as a reference manual. The goal is to help you to step-in and improve your document review projects.

The TAR Course has sixteen classes listed below. Click on some and check them out. All free. We do not even require registration. No tests either, but someday soon that may change. Stay tuned to the e-Discovery Team. This is just the first step dear readers of my latest hack of the profession. Change we must, and not just gradual, but radical. That is the only way the Law can keep up with the accelerating advances in technology. Taking the TAR Course is a minimum requirement and will get you ready for the next stage.

  1. First Class: Introduction
  2. Second Class: TREC Total Recall Track
  3. Third Class: Introduction to the Nine Insights Concerning the Use of Predictive Coding in Legal Document Review
  4. Fourth Class: 1st of the Nine Insights – Active Machine Learning
  5. Fifth Class: Balanced Hybrid and Intelligently Spaced Training
  6. Sixth Class: Concept and Similarity Searches
  7. Seventh Class: Keyword and Linear Review
  8. Eighth Class: GIGO, QC, SME, Method, Software
  9. Ninth Class: Introduction to the Eight-Step Work Flow
  10. Tenth Class: Step One – ESI Communications
  11. Eleventh Class: Step Two – Multimodal ECA
  12. Twelfth Class: Step Three – Random Prevalence
  13. Thirteenth Class: Steps Four, Five and Six – Iterate
  14. Fourteenth Class: Step Seven – ZEN Quality Assurance Tests
  15. Fifteenth Class: Step Eight – Phased Production
  16. Sixteenth Class: Conclusion

This course is not about the theory or law of predictive coding. You can easily get that elsewhere. It is about learning the latest methods to do predictive coding. It is about learning how to train an AI to find the ESI evidence you want. The future looks bright for attorneys with both legal knowledge and skills and software knowledge and skills. The best and brightest will also be able to work with various kinds of specialized AI to do a variety of tasks, including AI-enhanced document review. If that is your interest, then jump onto the TAR Course and start your training today. Who knows where it may take you?

________

__

.

 

 


Judge Peck Orders All Lawyers in NY to Follow the Rules when Objecting to Requests for Production, or Else …

March 5, 2017

peckJudge Peck has issued a second wake-up call type of opinion in Fischer v. Forrest,  _ F. Supp. 3rd _, 2017 WL 773694 (S.D.N.Y. Feb. 28, 2017). Judge Peck’s first wake-up call in 2009 had to do with the basics of keyword search – William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Co., 256 F.R.D. 134 (S.D.N.Y. 2009), which is still one of my favorite all time e-discovery opinions. His second wake-up call has to do with the basics of Rule 34, specifically subsections (b)(2)(B) and (b)(2)(C):

It is time, once again, to issue a discovery wake-up call to the Bar in this District:1/ the Federal Rules of Civil Procedure were amended effective December 1, 2015, and one change that affects the daily work of every litigator is to Rule 34. Specifically (and I use that term advisedly), responses to discovery requests must:

* State grounds for objections with specificity;

* An objection must state whether any responsive materials are being withheld on the basis of that objection; and

* Specify the time for production and, if a rolling production, when production will begin and when it will be concluded.

Most lawyers who have not changed their “form file” violate one or more (and often all three) of these changes.

Judge Peck is right about that. But there are so many technical rules that lawyers do not exactly follow. Nothing new here. Boring. Right? Wrong. Why. Because Judge Peck has added teeth to his observation.

dino teachersIt is well known that most lawyers will continue to use their old forms unless they are pried out of their dying hands. Mere changes in the rules and resulting technical violations are not about to interfere with the basic lethargy inherent in legal practice. The law changes so slowly, even big money saving improvements like predictive coding are met with mere lip service praise followed by general neglect (hey, it requires learning and change).

judge_peckSo how to get lawyers attention? Judge Peck knows a way, it involves threats. Here is his conclusion in Fischer.

Conclusion
The December 1, 2015 amendments to the Federal Rules of Civil Procedure are now
15 months old. It is time for all counsel to learn the now-current Rules and update their “form” files. From now on in cases before this Court, any discovery response that does not comply with Rule 34’s requirement to state objections with specificity (and to clearly indicate whether responsive material is being withheld on the basis of objection) will be deemed a waiver of all objections (except as to privilege).

waived-stampYes. Judge Peck used the “w” word that lawyers all fear – WAIVER. And waiver of all objections no less. Lawyers love their objections and do not want anyone taking them away from them. They may even follow the rules to protect them. At least, that is Judge Peck’s thinking.

Rule 34(b)(2)

34b2What does that rule say that so many are thoughtlessly violating. What has made dear Judge Peck so hot under the collar? You can read his opinion to get the answer, and I strongly recommend that you do, but here are the two subsections of Rule 34(b)(2) that no one seems to be following. You might want to read them through carefully a few times.

Rule 34. Producing Documents, Electronically Stored Information, and Tangible Things, or Entering onto Land, for Inspection and Other Purposes

(b) Procedure.

(2) Responses and Objections.

(B) Responding to Each Item. For each item or category, the response must either state that inspection and related activities will be permitted as requested or state with specificity the grounds for objecting to the request, including the reasons. The responding party may state that it will produce copies of documents or of electronically stored information instead of permitting inspection. The production must then be completed no later than the time for inspection specified in the request or another reasonable time specified in the response.

(C) Objections. An objection must state whether any responsive materials are being withheld on the basis of that objection. An objection to part of a request must specify the part and permit inspection of the rest.

As to what that means you have no higher authority than the official Comments of the Rules Committee itself, which Judge Peck also quotes in full and adds some underlines for emphasis:

Rule 34(b)(2)(B) is amended to require that objections to Rule 34 requests be stated with specificity. This provision adopts the language of Rule 33(b)(4), eliminating any doubt that less specific objections might be suitable under Rule 34. The specificity of the objection ties to the new provision in Rule 34(b)(2)(C) directing that an objection must state whether any responsive materials are being withheld on the basis of that objection. An objection may state that a request is overbroad, but if the objection recognizes that some part of the request is appropriate the objection should state the scope that is not overbroad. Examples would be a statement that the responding party will limit the search to documents or electronically stored information created within a given period of time prior to the events in suit, or to specified sources. When there is such an objection, the statement of what has been withheld can properly identify as matters “withheld” anything beyond the scope of the search specified in the objection.

Rule 34(b)(2)(B) is further amended to reflect the common practice of producing copies of documents or electronically stored information rather than simply permitting inspection. The response to the request must state that copies will be produced. The production must be completed either by the time for inspection specified in the request or by another reasonable time specifically identified in the response. When it is necessary to make the production in stages the response should specify the beginning and end dates of the production.

Rule 34(b)(2)(C) is amended to provide that an objection to a Rule 34 request must state whether anything is being withheld on the basis of the objection. This amendment should end the confusion that frequently arises when a producing party states several objections and still produces information, leaving the requesting party uncertain whether any relevant and responsive information has been withheld on the basis of the objections. The producing party does not need to provide a detailed description or log of all documents withheld, but does need to alert other parties to the fact that documents have been withheld and thereby facilitate an informed discussion of the objection. An objection that states the limits that have controlled the search for responsive and relevant materials qualifies as a statement that the materials have been “withheld.”

2015 Adv. Comm. Notes to Rule 34 (emphasis added by Judge Peck).

Going back to Judge Peck’s analysis of the objections made in Fischer v. Forrest:

Let us count the ways defendants have violated the Rules:

First, incorporating all of the General Objections into each response violates Rule 34(b)(2)(B)’s specificity requirement as well as Rule 34(b)(2)(C)’s requirement to indicate whether any responsive materials are withheld on the basis of an objection. General objections should rarely be used after December 1, 2015 unless each such objection applies to each document request (e.g., objecting to produce privileged material).

Second, General Objection I objected on the basis of non-relevance to the “subject matter of this litigation.” (See page 3 above.) The December 1, 2015 amendment to Rule 26(b)(1) limits discovery to material “relevant to any party’s claim or defense . . . .” Discovery about “subject matter” no longer is permitted. General Objection I also objects that the discovery is not “likely to lead to the discovery of relevant, admissible evidence.” The 2015 amendments deleted that language from Rule 26(b)(1), and lawyers need to remove it from their jargon. See In re Bard IVC Filters Prod. Liab. Litig., 317 F.R.D. 562, 564 (D. Ariz. 2016) (Campbell, D.J.) (“The 2015 amendments thus eliminated the ‘reasonably calculated’ phrase as a definition for the scope of permissible discovery. Despite this clear change, many courts [and lawyers] continue to use the phrase. Old habits die hard. . . . The test going forward is whether evidence is ‘relevant to any party’s claim or defense,’ not whether it is ‘reasonably calculated to lead to admissible evidence.”‘).

Third, the responses to requests 1-2 stating that the requests are “overly broad and unduly burdensome” is meaningless boilerplate. Why is it burdensome? How is it overly broad? This language tells the Court nothing. Indeed, even before the December 1, 2015 rules amendments, judicial decisions criticized such boilerplate objections. See, e.g., Mancia v. Mayflower Textile Servs. Co., 253 F.R.D. 354, 358 (D. Md. 2008) (Grimm, M.J.) (“[B]oilierplate objections that a request for discovery is ‘over[broad] and unduly burdensome, and not reasonably calculated to lead to the discovery of material admissible in evidence,’ persist despite a litany of decisions from courts, including this one, that such objections are improper unless based on particularized facts.” (record cite omitted)).

Finally, the responses do not indicate when documents and ESI that defendants are producing will be produced.

________________

34b2-please_peck

POSTSCRIPT

Attorneys everywhere, not just in Judge Peck’s Court in New York, would be well advised to follow his wake-up call. Other courts around the country have already begun to follow in his footsteps. District Court Judge Mark Bennett has bench-slapped all counsel in Liguria Foods, Inc. v. Griffith Laboratories, Inc., 2017 BL 78800, N.D. Iowa, No. C 14-3041, 3/13/17), and, like Judge Peck, Iowa D.C. Judge Bennett warned all attorneys of future sanctions. Opinion found at Google Scholar at: https://scholar.google.com/scholar_case?case=13539597862614970677&hl=en&as_sdt=40006

The quick take-aways from the lengthy Liguria Foods opinion are:

  • Obstructionist discovery responses” in civil cases are a “menacing scourge” that must be met in the future with “substantial sanctions.
  • Attorneys are addicted to “repetitive discovery objections” that are “devoid of individualized factual analysis.”
  • “Judges need to push back, get our judicial heads out of the sand, stop turning a blind eye to the ‘boilerplate’ discovery culture and do our part to solve this cultural discovery ‘boilerplate’ plague.” 
  • Only by “imposing increasingly severe sanctions” will judges begin to change the culture of discovery abuse.
  • Instead of sanctions, the court accepted the “sincere representations” from the lead attorneys that they will be “ambassadors for changing the ‘boilerplate’ discovery objection culture” in both of their law firms.
  • NO MORE WARNINGS. IN THE FUTURE, USING ‘BOILERPLATE’ OBJECTIONS TO DISCOVERY IN ANY CASE BEFORE ME PLACES COUNSEL AND THEIR CLIENTS AT RISK FOR SUBSTANTIAL SANCTIONS,” the court said in ALL CAPS as the closing sentence.

This opinion is full of colorful language, not only about bad forms, but also over-contentious litigation. Judges everywhere are tired of the form objections most attorneys still use, especially if they don’t conform to new rules (or any rules). You should consider becoming the ambassador for changing the ‘boilerplate’ discovery objection culture in your firm.

Judge Bennett’s analysis points to many form objections, including Interrogatory objections, not just those saying “overbroad” that were discussed by Judge Peck under Rule 34 in Fischer. Here are a few general boilerplate objections that he points, which all look familiar to me:

  • Objection “to the extent they seek to impose obligations on it beyond those imposed by the Federal Rules of Civil Procedure or any other applicable rules or laws.”
  • Objection “to the extent they call for documents protected by the attorney-client privilege, the work product rule, or any other applicable privilege.”
  • Objection “to the extent they request the production of documents that are not relevant, are not reasonably calculated to lead to the discovery of admissible evidence or are not within their possession, custody and control.”
  • Objection “overbroad, unduly burdensome”
  • Objection “insofar as they seek information that is confidential or proprietary.”
  • “subject to [and without waiving] its general and specific objections”
  • Objection “”as the term(s) [X and Y] are not defined.”

Here are a few quotes to give you the flavor of how many judges feel about form objections:

This case squarely presents the issue of why excellent, thoughtful, highly professional, and exceptionally civil and courteous lawyers are addicted to “boilerplate” discovery objections.[1] More importantly, why does this widespread addiction continue to plague the litigation industry when counsel were unable to cite a single reported or non-reported judicial decision or rule of civil procedure from any jurisdiction in the United States, state or federal, that authorizes, condones, or approves of this practice? What should judges and lawyers do to substantially reduce or, more hopefully and optimistically, eliminate this menacing scourge on the legal profession? Perhaps surprisingly to some, I place more blame for the addiction, and more promise for a cure, on the judiciary than on the bar.

Indeed, obstructionist discovery practice is a firmly entrenched “culture” in some parts of the country, notwithstanding that it involves practices that are contrary to the rulings of every federal and state court to address them. As I remarked at an earlier hearing in this matter, “So what is it going to take to get . . . law firms to change and practice according to the rules and the cases interpreting the rules? What’s it going to take?”

On January 27, 2017, I entered an Order To Show Cause Why Counsel For Both Parties Should Not Be Sanctioned For Discovery Abuses And Directions For Further Briefing. In the Order To Show Cause, I directed that every attorney for the parties who signed a response to interrogatories or a response to a request for documents in this case, with the exception of local counsel, appear and show cause, at a hearing previously scheduled for March 7, 2017, why he should not be sanctioned for discovery abuses.

Judge Bennett held a long evidentiary hearing before issuing this Order where he questioned many attorneys for both sides under oath. (Amazing, huh?) This lead to the following comments in the Opinion:

As to the question of why counsel for both sides had resorted to “boilerplate” objections, counsel admitted that it had a lot to do with the way they were trained, the kinds of responses that they had received from opposing parties, and the “culture” that routinely involved the use of such “standardized” responses. Indeed, one of the attorneys indicated that some clients—although not the clients in this case—expect such responses to be made on their behalf. I believe that one of the attorneys hit the nail squarely on the head when he asserted that such responses arise, at least in part, out of “lawyer paranoia” not to waive inadvertently any objections that might protect the parties they represent. Even so, counsel for both parties admitted that they now understood that such “boilerplate” objections do not, in fact, preserve any objections. Counsel also agreed that part of the problem was a fear of “unilateral disarmament.” This is where neither party’s attorneys wanted to eschew the standard, but impermissible, “boilerplate” practices that they had all come to use because they knew that the other side would engage in “boilerplate” objections. Thus, many lawyers have become fearful to comply with federal discovery rules because their experience teaches them that the other side would abuse the rules. Complying with the discovery rules might place them at a competitive disadvantage.

Heed these calls and be your law firm’s ambassador for changing the ‘boilerplate’ discovery objection culture. Stop lawyers from making form objections, especially general objections, or risk the wrath of your local judge. Or, to put it another way, using other quaint boilerplate: Please Be Governed Accordingly.

 


e-Discovery Team’s 2016 TREC Report: Once Again Proving the Effectiveness of Our Standard Method of Predictive Coding

February 24, 2017

Team_TRECOur Team’s Final Report of its participation in the 2016 TREC ESI search Conference has now been published online by NIST and can be found here. TREC stands for Text Retrieval Conference. It is co-sponsored by a group within the National Institute of Standards and Technology (NIST), which is turn is an agency of the U.S. Commerce Department. The stated purpose of the annual TREC conference is to encourage research in information retrieval from large text collections.

The other co-sponsor of TREC is the United States Department of Defense. That’s right, the DOD is the official co-sponsor of this event, although TREC almost never mentions that. Can you guess why the DOD is interested? No one talks about it at TREC, but I have some purely speculative ideas. Recall that the NSA is part of the DOD.

We participated in one of several TREC programs in both 2015 and 2016, the one closest to legal search, called the Total Recall Track. The leaders, administrators of this Track were Professors Gordon Cormack and Maura Grossman. They also participated each year in their own track.

One of the core purposes of all of the Tracks is to demonstrate the robustness of core retrieval technology. Moreover, one of the primary goals of TREC is:

[T]o speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems.

Our participation in TREC in 2015 and 2016 has demonstrated substantial improvements in retrieval methodologies. That is what we set out to do. That is the whole point of the collaboration between the Department of Commerce and Department of Defense to establish TREC.

clinton_emailThe e-Discovery Team has a commercial interest in participation in TREC, not a defense or police interest. Although from what we saw with the FBI’s struggles to search email last year, the federal government needs help. We were very unimpressed by the FBI’s prolonged efforts to review the Clinton email collection. I was one of the few e-discovery lawyers to correctly call the whole Clinton email server “scandal” a political tempest in a teapot. I still do and I am still outraged by how her email review was handled by the FBI, especially with the last-minute “revelations.”

prism_nsaThe executive agencies of the federal government have been conspicuously absent from TREC. They seem incapable of effective search, which may well be a good thing. Still, we have to believe that the NSA and other defense agencies are able to do a far better job at large-scale search than the FBI. Consider their ongoing large-scale metadata and text interception efforts, including the once Top Secret PRISM operation. Maybe it is a good thing the NSA doe not share it abilities with the FBI, especially these days. Who knows? We certainly will not.

Mr_EDRThe e-Discovery Team’s commercial interest is to transfer Predictive Coding technology from our research labs into commercial products, namely transfer our Predictive Coding 4.0 Method using KrolL Discovery EDR software to commercial products. In our case at the present time “commercial products” means our search methods, time and consultations. But who knows, it may be reduced to a robot product someday like our Mr. EDR.

The e-Discovery Team method can be used on other document review platforms as well, not just Kroll’s, but only if they have strong active machine learning features. Active machine learning is what everyone at TREC was testing, although we appear to have been the only participant to focus on a particular method of operation. And we were the only team led by a practicing attorney, not an academic or software company. (Catalyst also fielded a team in 2015 and 2106 headed by Information Science Ph.D., Jeremy Pickens.)

Olympics-finish-line-Usain-Bolt-winsThe e-Discovery Team wanted to test the hybrid multimodal software methods we use in legal search to demonstrate substantial improvements in retrieval methodologies on real-world problems. We have now done so twice; participating in both the 2015 and 2016 Total Recall Tracks. The results in 2016 were even better than 2015. We obtained remarkable results in document review speed, recall and precision; although, as we admit, the search challenges presented at TREC 2016 were easier than most projects we see in legal discovery. Still, to use the quaint language of TREC, we have demonstrated the robustness of our methods and software.

These demonstrations, and all of the reporting and analysis involved, have taken hundreds of hours of our time, but there was no other venue around to test our retrieval methodologies on real-world problems. The demonstrations are now over. We have proven our case. Our standard Predictive Coding method has been tested and its effectiveness demonstrated. No one else has tested and proven their predictive coding methods as we have done. We have proven that our hybrid multimodal method of AI-Enhanced document review is the gold standard. We will continue to make improvements in our method and software, but we are done with participation in federal government programs to prove our standard, even one run by the National Institute of Standards and Technology.

predictive_coding_4-0_web

To prove our point that we have now demonstrated substantial improvements in retrieval methodologies, we quote below Section 5.1 of our official TREC report, but we urge you to read the whole thing. It is 164 pages. This section of our report covers our primary research question only. We investigated three additional research questions not included below.

__________

Section 5.1 First and Primary Research Question

What Recall, Precision and Effort levels will the e-Discovery Team attain in TREC test conditions over all thirty-four topics using the Team’s Predictive Coding 4.0 hybrid multimodal search methods and Kroll Ontrack’s software, eDiscovery.com Review (EDR).

Again, as in the 2015 Total Recall Track, the Team attained very good results with high levels of Recall and Precision in all topics, including perfect or near perfect results in several topics using the corrected gold standard. The Team did so even though it only used five of the eight steps in its usual methodology, intentionally severely constrained the amount of human effort expended on each topic and worked on a dataset stripped of metadata. The Team’s enthusiasm for the record-setting results, which were significantly better than its 2015 effort, is tempered by the fact that the search challenges presented in most of the topics in 2016 were not difficult and the TREC relevance judgments had to be corrected in most topics.  …

This next chart uses the corrected standard. It is the primary reference chart we use to measure our results. Unfortunately, it is not possible to make any comparisons with BMI standards because we do not know the order in which the BMI documents were submitted.

trec-16_revised-all-results

The average results obtained across all thirty-four topics at the time of reasonable call using the corrected standard are shown below in bold. The average scores using the uncorrected standard are shown for comparison in parentheses.

  • 88.17% Recall (75.46%)
  • 64.94% Precision (57.12%)
  • 69.15% F1 (57.69%)
  • 124 Docs Reviewed Effort (124)

At the time of reasonable call the Team had recall scores greater than 90% in twenty-two of the thirty-four topics and greater than 80% in five more topics. Recall of greater than 95% was attained in fourteen topics. These Recall scores under the corrected standard are shown in the below chart. The results are far better than we anticipated, including six topics with total recall – 100%, and two topics with both total recall and perfect precision, topic 417 Movie Gallery and topic 434 Bacardi Trademark.

recall-scores-amended-2016

At the time of reasonable call the Team had precision scores greater than 90% in thirteen of the thirty-four topics and greater than 75% in three more topics. Precision of greater than 95% was attained in nine topics. These Precision scores under the corrected standard are shown in the below chart. Again, the results were, in our experience, incredibly good, including three topics with perfect precision at the time of the reasonable call.

precision-scores-amended-2016

At the time of reasonable call the Team had F1 scores greater than 90% in twelve of the thirty-four topics and greater than 75% in two more. F1 of greater than 90% was attained in eight topics. These F1 scores under the corrected standard are shown in the below chart. Note there were two topics with a perfect score, Movie Gallery (100%) and Bacardi Trademark (100%) and three more that were near perfect: Felon Disenfranchisement (98.5%), James V. Crosby (97.57%), and Elian Gonzalez (97.1%).

f1-scores-amended_2016

We were lucky to attain two perfect scores in 2016 (we attained one in 2015), in topic 417 Movie Gallery and topic 434 Bacardi Trademark. The perfect score of 100% F1 was obtained in topic 417 by locating all 5,945 documents relevant under the corrected standard after reviewing only 66 documents. This topic was filled with form letters and was a fairly simple search.

The perfect score of 100% F1 was obtained in topic 434 Bacardi Trademark by locating all 38 documents relevant under the corrected standard after reviewing only 83 documents. This topic had some legal issues involved that required analysis, but the reviewing attorney, Ralph Losey, is an SME in trademark law so this did not pose any problems. The issues were easy and not critical to understand relevance. This was a simple search involving distinct language and players. All but one of the 38 relevant documents were found by tested, refined keyword search. One additional relevant document was found by a similarity search. Predictive coding searches were run after the keywords searches and nothing new was uncovered. Here machine learning merely performed a quality assurance role to verify that all relevant documents had indeed been found.

The Team proved once again, as it did in 2015, that perfect recall and perfect precision is possible, albeit rare, using the Team’s methods and fairly simple search projects.

The Team’s top ten projects attained remarkably high scores with an average Recall of 95.66%, average Precision of 97.28% and average F-Measure: 96.42%. The top ten are shown in the chart below.

top-10_results

In addition to Recall, Precision and F1, the Team per TREC requirements also measured the effort involved in each topic search. We measured effort by the number of documents that were actually human-reviewed prior to submission and coded relevant or irrelevant. We also measured effort by the total human time expended for each topic. Overall, the Team human-reviewed only 6,957 documents to find all the 34,723 relevant documents within the overall corpus of 9,863,366 documents. The total time spent by the Team to review the 6,957 documents, and do all the search and analysis and other work using our Hybrid Multimodal Predictive Coding 4.0 method, was 234.25 hours. reviewed_data_pie_chart_2016

It is typical in legal search to try to measure the efficiency of a document review by the number of documents classified by an attorney in an hour. For instance, a typical contract review attorney can read and classify an average of 50 documents per hour. The Team classified 9,863,366 documents by review of 6,957 documents taking a total time of 234.25 hours. The Team’s overall review rate for the entire corpus was thus 42,106 files per hour (9,863,366/234.25).

In legal search it is also typical, indeed mandatory, to measure the costs of review and bill clients accordingly. If we here assume a high attorney hourly rate of $500 per hour, then the total cost of the review of all 34 Topics would be $117,125. That is a cost of just over $0.01 per document. In a traditional legal review, where a lawyer reviews one document at a time, the cost would be far higher. Even if you assume a low attorney rate of $50 per hour, and review speed of 50 files per hour, the total cost to review every document for every issue would be $9,863,366. That is a cost of $1.00 per document, which is actually low by legal search standards.13

Analysis of project duration is also very important in legal search. Instead of the 234.25 hours expended by our Team using Predictive Coding 4.0, traditional linear review would have taken 197,267 hours (9,863,366/50). In other words, the review of thirty-four projects, which we did in our part-time after work in one Summer, would have taken a team of two lawyers using traditional methods, 8 hours a day, every day, over 33 years! These kinds of comparisons are common in Legal Search.

Detailed descriptions of the searches run in all thirty-four topics are included in the Appendix.

___________

We also reproduce below Section 1.0, Summary of Team Efforts, from our 2016 TREC Report. For more information on what we learned in the 2016 TREC see alsoComplete Description in 30,114 Words and 10 Videos of the e-Discovery Team’s “Predictive Coding 4.0” Method of Electronic Document ReviewNine new insights that we learned in the 2016 research are summarized by the below diagram more specifically described in the article.

predictive_coding_six-three-2

_________

Excerpt From Team’s 2016 Report

1.1 Summary of Team’s Efforts. The e-Discovery Team’s 2016 Total Recall Track Athome project started June 3, 2016, and concluded on August 31, 2016. Using a single expert reviewer in each topic the Team classified 9,863,366 documents in thirty-four review projects.

The topics searched in 2016 and their issue names are shown in the chart below. Also included are the first names of the e-Discovery Team member who did the review for that topic, the total time spent by that reviewer and the number of documents manually reviewed to find all of the relevant documents in that topic. The total time of all reviewers on all projects was 234.25 hours. All relevant documents, totaling 34,723 by Team count, were found by manual review of 6,957 documents. The thirteen topics in red were considered mandatory by TREC and the remaining twenty-one were optional. The e-Discovery Team did all topics.

trec-2016-topics

They were all one-person, solo efforts, although there was coordination and communications between Team members on the Subject Matter Expert (SME) type issues encountered. This pertained to questions of true relevance and errors found in the gold standard for many of these topics. A detailed description of the search for each topic is contained in the Appendix.

In each topic the assigned Team attorney personally read and evaluated for true relevance every email that TREC returned as a relevant document, and every email that TREC unexpectedly returned as Irrelevant. Some of these were read and studied multiple times before we made our final calls on true relevance, determinations that took into consideration and gave some deference to the TREC assessor adjudications, but were not bound by them. Many other emails that the Team members considered irrelevant, and TREC agreed, were also personally reviewed as part of their search efforts. As mentioned, there was sometimes consultations and discussion between Team members as to the unexpected TREC opinions on relevance.

This contrasts sharply with participants in the Sandbox division. They never make any effort to determine where their software made errors in predicting relevance, or for any other reasons. They accept as a matter of faith the correctness of all TREC’s prior assessment of relevance. To these participants, who were all academic institutions, the ground truth itself as to relevance or not, was of no relevance. Apparently, that did not matter to their research.

All thirty-four topics presented search challenges to the Team that were easier, some far easier, than the Team typically face as attorneys leading legal document review projects. (If the Bush email had not been altered by omission of metadata, the searches would have been even easier.) The details of the searches performed in each of the thirty-four topics are included in the Appendix. The search challenges presented by these topics were roughly equivalent to the most simplistic challenges that the e-Discovery Team might face in projects involving relatively simple legal disputes. A few of the search topics in 2016 included quasi legal issues, more than were found in the 2015 Total Recall Track. This is a revision that the Team requested and appreciated because it allowed some, albeit very limited testing of legal judgment and analysis in determination of true relevance in these topics. In legal search relevancy, legal analysis skills are obviously very important. In most of the 2016 Total Recall topics, however, no special legal training or analysis was required for a determination of true relevance.

At Home participants were asked to track and report their manual efforts. The e-Discovery Team did this by recording the number of documents that were human reviewed and classified prior to submission. More were reviewed after submission as part of the Team’s TREC relevance checking. Virtually all documents human reviewed were also classified, although all documents classified were not used for active training of the software classifier. The Team also tracked effort by number of attorney hours worked as is traditional in legal services. Although the amount of time varied somewhat by topic, the average time spent per topic was only 6.89 hours. The average review and classification speed for each project was 42,106 files per hour (9,863,366/234.25).

Again, for the full picture and complete details of our work please see the complete 164 page report to TREC of the e-Discovery Team’s Participation in the 2016 Total Recall Track.

 

 

 

 


%d bloggers like this: