TAR Course Expands Again: Standardized Best Practice for Technology Assisted Review

February 11, 2018

The TAR Course has a new class, the Seventeenth Class: Another “Player’s View” of the Workflow. Several other parts of the Course have been updated and edited. It now has Eighteen Classes (listed at end). The TAR Course is free and follows the Open Source tradition. We freely disclose the method for electronic document review that uses the latest technology tools for search and quality controls. These technologies and methods empower attorneys to find the evidence needed for all text-based investigations. The TAR Course shares the state of the art for using AI to enhance electronic document review.

The key is to know how to use the document review search tools that are now available to find the targeted information. We have been working on various methods of use since our case before Judge Andrew Peck in Da Silva Moore in 2012. After we helped get the first judicial approval of predictive coding in Da Silva, we began a series of several hundred document reviews, both in legal practice and scientific experiments. We have now refined our method many times to attain optimal efficiency and effectiveness. We call our latest method Hybrid Multimodal IST Predictive Coding 4.0.

The Hybrid Multimodal method taught by the TARcourse.com combines law and technology. Successful completion of the TAR course requires knowledge of both fields. In the technology field active machine learning is the most important technology to understand, especially the intricacies of training selection, such as Intelligently Spaced Training (“IST”). In the legal field the proportionality doctrine is key to the  pragmatic application of the method taught at TAR Course. We give-away the information on the methods, we open-source it through this publication.

All we can transmit by online teaching is information, and a small bit of knowledge. Knowing the Information in the TAR Course is a necessary prerequisite for real knowledge of Hybrid Multimodal IST Predictive Coding 4.0. Knowledge, as opposed to Information, is taught the same way as advanced trial practice, by second chairing a number of trials. This kind of instruction is the one with real value, the one that completes a doc review project at the same time it completes training. We charge for document review and throw in the training. Information on the latest methods of document review is inherently free, but Knowledge of how to use these methods is a pay to learn process.

The Open Sourced Predictive Coding 4.0 method is applied for particular applications and search projects. There are always some customization and modifications to the default standards to meet the project requirements. All variations are documented and can be fully explained and justified. This is a process where the clients learn by doing and following along with Losey’s work.

What he has learned through a lifetime of teaching and studying Law and Technology is that real Knowledge can never be gained by reading or listening to presentations. Knowledge can only be gained by working with other people in real-time (or near-time), in this case, to carry out multiple electronic document reviews. The transmission of knowledge comes from the Q&A ESI Communications process. It comes from doing. When we lead a project, we help students to go from mere Information about the methods to real Knowledge of how it works. For instance, we do not just make the Stop decision, we also explain the decision. We share our work-product.

Knowledge comes from observing the application of the legal search methods in a variety of different review projects. Eventually some Wisdom may arise, especially as you recover from errors. For background on this triad, see Examining the 12 Predictions Made in 2015 in “Information → Knowledge → Wisdom” (2017). Once Wisdom arises some of the sayings in the TAR Course may start to make sense, such as our favorite “Relevant Is Irrelevant.” Until this koan is understood, the legal doctrine of Proportionality can be an overly complex weave.

The TAR Course is now composed of eighteen classes:

  1. First Class: Background and History of Predictive Coding
  2. Second Class: Introduction to the Course
  3. Third Class:  TREC Total Recall Track, 2015 and 2016
  4. Fourth Class: Introduction to the Nine Insights from TREC Research Concerning the Use of Predictive Coding in Legal Document Review
  5. Fifth Class: 1st of the Nine Insights – Active Machine Learning
  6. Sixth Class: 2nd Insight – Balanced Hybrid and Intelligently Spaced Training (IST)
  7. Seventh Class: 3rd and 4th Insights – Concept and Similarity Searches
  8. Eighth Class: 5th and 6th Insights – Keyword and Linear Review
  9. Ninth Class: 7th, 8th and 9th Insights – SME, Method, Software; the Three Pillars of Quality Control
  10. Tenth Class: Introduction to the Eight-Step Work Flow
  11. Eleventh Class: Step One – ESI Communications
  12. Twelfth Class: Step Two – Multimodal ECA
  13. Thirteenth Class: Step Three – Random Prevalence
  14. Fourteenth Class: Steps Four, Five and Six – Iterative Machine Training
  15. Fifteenth Class: Step Seven – ZEN Quality Assurance Tests (Zero Error Numerics)
  16. Sixteenth Class: Step Eight – Phased Production
  17. Seventeenth Class: Another “Player’s View” of the Workflow (class added 2018)
  18. Eighteenth Class: Conclusion

With a lot of hard work you can complete this online training program in a long weekend, but most people take a few weeks. After that, this course can serve as a solid reference to consult during complex document review projects. It can also serve as a launchpad for real Knowledge and eventually some Wisdom into electronic document review. TARcourse.com is designed to provide you with the Information needed to start this path to AI enhanced evidence detection and production.

 


The Great Debate in AI Ethics Surfaces on Social Media: Elon Musk v. Mark Zuckerberg

August 6, 2017

I am a great admirer of both Mark Zuckerberg and Elon Musk. That is one reason why the social media debate last week between them concerning artificial intelligence, a subject also near and dear, caused such dissonance. How could they disagree on such an important subject? This blog will lay out the “great debate.”

It is far from a private argument between Elon and Mark.  It is a debate that percolates throughout scientific and technological communities concerned with AI. My sister AI-Ethics.com web begins with this debate. If you have not already visited this web, I hope you will do so after reading this blog. It begins by this same debate review. You will also see at AI-Ethics.com that I am seeking volunteers to help: (1) prepare a scholarly article on the AI Ethics Principles already created by other groups; and, (2) research the viability of sponsoring an interdisciplinary conference on AI Principles. For more background on these topics see the library of suggested videos found at AI-Ethics Videos. They provide interesting, easy to follow (for the most part), reliable information on artificial intelligence. This is something that everybody should know at least something about if they want to keep up with ever advancing technology. It is a key topic.

The Debate Centers on AI’s Potential for Superintelligence

The debate arises out of an underlying agreement that artificial intelligence has the potential to become smarter than we are, superintelligent. Most experts agree that super-evolved AI could become a great liberator of mankind that solves all problems, cures all diseases, extends life indefinitely and frees us from drudgery. Then out of that common ebullient hope arises a small group that also sees a potential dystopia. These utopia party-poopers fear that a super-evolved AI could doom us all to extinction, that is, unless we are not careful. So both sides of the future prediction scenarios agree that many good things are possible, but, one side insists that some very bad things are also possible, that the dark side risks even include extinction of the human species.

The doomsday scenarios are a concern to some of the smartest people alive today, including Stephen Hawking, Elon Musk and Bill Gates. They fear that superintelligent AIs could run amuck without appropriate safeguards. As stated, other very smart people strongly disagree with all doomsday fears, including Mark Zuckerberg.

Mark Zuckerberg’s company, Facebook, is a leading researcher in the field of general AI. In a backyard video that Zuckerberg made live on Facebook on July 24, 2017, with six million of his friends watching on, Mark responded to a question from one: “I watched a recent interview with Elon Musk and his largest fear for future was AI. What are your thoughts on AI and how it could affect the world?”

Zuckerberg responded by saying:

I have pretty strong opinions on this. I am optimistic. I think you can build things and the world gets better. But with AI especially, I am really optimistic. And I think people who are naysayers and try to drum up these doomsday scenarios — I just, I don’t understand it. It’s really negative and in some ways I actually think it is pretty irresponsible.

In the next five to 10 years, AI is going to deliver so many improvements in the quality of our lives.

Zuckerberg said AI is already helping diagnose diseases and that the AI in self-driving cars will be a dramatic improvement that saves many lives. Zuckerberg elaborated on his statement as to naysayers like Musk being irresponsible.

Whenever I hear people saying AI is going to hurt people in the future, I think yeah, you know, technology can generally always be used for good and bad, and you need to be careful about how you build it and you need to be careful about what you build and how it is going to be used.

But people who are arguing for slowing down the process of building AI, I just find that really questionable. I have a hard time wrapping my head around that.

Mark’s position is understandable when you consider his Hacker Way philosophy where Fast and Constant Improvements are fundamental ideas. He did, however, call Elon Musk “pretty irresponsible” for pushing AI regulations. That prompted a fast response from Elon the next day on Twitter. He responded to a question he received from one of his followers about Mark’s comment and said: I’ve talked to Mark about this. His understanding of the subject is limited. Elon Musk has been thinking and speaking up about this topic for many years. Elon also praises AI, but thinks that we need to be careful and consider regulations.

The Great AI Debate

In 2014 Elon Musk referred to developing general AI as summoning the demon. He is not alone in worrying about advanced AI. See eg. Open-AI.com and CSER.org. Steven Hawking, usually considered the greatest genius of our time, has also commented on the potential danger of AI on several occasions. In speech he gave in 2016 at Cambridge marking the opening of the Center for the Future of Intelligence, Hawking said: “In short, the rise of powerful AI will be either the best, or the worst thing, ever to happen to humanity. We do not yet know which.” Here is Hawking’s full five minute talk on video:

Elon Musk warned state governors on July 15, 2017 at the National Governors Association Conference about the dangers of unregulated Artificial Intelligence. Musk is very concerned about any advanced AI that does not have some kind of ethics programmed into its DNA. Musk said that “AI is a fundamental existential risk for human civilization, and I don’t think people fully appreciate that.” He went on to urge the governors to begin investigating AI regulation now: “AI is a rare case where we need to be proactive about regulation instead of reactive. Because I think by the time we are reactive in AI regulation, it’s too late.”

Bill Gates agrees. He said back in January 2015 that

I am in the camp that is concerned about super intelligence. First the machines will do a lot of jobs for us and not be super intelligent. That should be positive if we manage it well. A few decades after that though the intelligence is strong enough to be a concern. I agree with Elon Musk and some others on this and don’t understand why some people are not concerned.

Elon Musk and Bill Gates spoke together on the Dangers of Artificial Intelligence at an event in China in 2015. Elon compared work on the AI to work on nuclear energy and said it was just as dangerous as nuclear weapons. He said the right emphasis should be on AI safety, that we should not be rushing into something that we don’t understand. Statements like that makes us wonder what Elon Musk knows that Mark Zuckerberg does not?

Bill Gates at the China event responded by agreeing with Musk. Bill also has some amusing, interesting statements about human wet-ware, our slow brain algorithms. He spoke of our unique human ability to take experience and turn it into knowledge. See: Examining the 12 Predictions Made in 2015 in “Information → Knowledge → Wisdom. Bill Gates thinks that as soon as machines gain this ability, they will almost immediately move beyond the human level of intelligence. They will read all the books and articles online, maybe also all social media and private mail. Bill has no patience for skeptics of the inherent danger of AI: How can they not see what a huge challenge this is?

Gates, Musk and Hawking are all concerned that a Super-AI using computer connections, including the Internet, could take actions of all kinds, both global and micro. Without proper standards and safeguards they could modify conditions and connections before we even knew what they were doing. We would not have time to react, nor the ability to react, unless certain basic protections are hardwired into the AI, both in silicon form and electronic algorithms. They all urge us to take action now, rather than wait and react.

To close out the argument for those who fear advanced AI and urge regulators to start thinking about how to restrain it now, consider the Ted Talk by Sam Harris on October 19, 2016, Can we build AI without losing control over it? Sam, a neuroscientist and writer, has some interesting ideas on this.

On the other side of the debate you will find most, but not all, mainstream AI researchers. You will also find many technology luminaries, such as Mark Zuckerberg and Ray Kurzweil. They think that the doomsday concerns are pretty irresponsible. Oren Etzioni, No, the Experts Don’t Think Superintelligent AI is a Threat to Humanity (MIT Technology Review, 9/20/16); Ben Sullivan, Elite Scientists Have Told the Pentagon That AI Won’t Threaten Humanity (Motherboard 1/19/17).

You also have famous AI scholars and researchers like Pedro Domingos who are skeptical of all superintelligence fears, even of AI ethics in general. Domingos stepped into the Zuckerberg v. Musk social media dispute by siding with Zuckerberg. He told Wired on July 17, 2017 that:

Many of us have tried to educate him (meaning Musk) and others like him about real vs. imaginary dangers of AI, but apparently none of it has made a dent.

Tom Simonite, Elon Musk’s Freak-Out Over Killer Robots Distracts from Our Real AI Problems, (Wired, 7/17/17).

Domingos also famously said in his book, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World, a book which we recommend:

People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world.

We can relate with that. On the question of AI ethics Professor Domingos said in a 2017 University of Washington faculty interview:

But Domingos says that when it comes to the ethics of artificial intelligence, it’s very simple. “Machines are not independent agents—a machine is an extension of its owner—therefore, whatever ethical rules of behavior I should follow as a human, the machine should do the same. If we keep this firmly in mind,” he says, “a lot of things become simplified and a lot of confusion goes away.” …

It’s only simple so far as the ethical spectrum remains incredibly complex, and, as Domingos will be first to admit, everybody doesn’t have the same ethics.

“One of the things that is starting to worry me today is that technologists like me are starting to think it’s their job to be programming ethics into computers, but I don’t think that’s our job, because there isn’t one ethics,” Domingos says. “My job isn’t to program my ethics into your computer; it’s to make it easy for you to program your ethics into your computer without being a programmer.”

We agree with that too. No one wants technologists alone to be deciding ethics for the world. This needs to be a group effort, involving all disciplines, all people. It requires full dialogue on social policy, ultimately leading to legal codifications.

The Wired article of Jul 17, 2017, also states Domingos thought it would be better not to focus on far-out superintelligence concerns, but instead:

America’s governmental chief executives would be better advised to consider the negative effects of today’s limited AI, such as how it is giving disproportionate market power to a few large tech companies.

The same Wired article states that Iyad Rahwan, who works on AI and society at MIT, doesn’t deny that Musk’s nightmare scenarios could eventually happen, but says attending to today’s AI challenges is the most pragmatic way to prepare. “By focusing on the short-term questions, we can scaffold a regulatory architecture that might help with the more unpredictable, super-intelligent AI scenarios.” We agree, but are also inclined to think we should at least try to do both at the same time. What if Musk, Gates and Hawking are right?

The Wired article also quotes, Ryan Callo, a Law Professor at the University of Washington, as saying in response to the Zuckerberg v Musk debate:

Artificial intelligence is something policy makers should pay attention to, but focusing on the existential threat is doubly distracting from it’s potential for good and the real-world problems it’s creating today and in the near term.

Simonite, Elon Musk’s Freak-Out Over Killer Robots Distracts from Our Real AI Problems, (Wired, 7/17/17).

But how far-out from the present is superintelligence? For a very pro-AI view, one this is not concerned with doomsday scenarios, consider the ideas of Ray Kurzweil, Google’s Director of Engineering. Kurzweil thinks that AI will attain human level intelligence by 2019, but will then mosey along and not attain super-intelligence, which he calls the Singularity, until 2045.

2029 is the consistent date I have predicted for when an AI will pass a valid Turing test and therefore achieve human levels of intelligence. I have set the date 2045 for the ‘Singularity’ which is when we will multiply our effective intelligence a billion fold by merging with the intelligence we have created.

Kurzweil is not worried about the impact of super-intelligent AI. To the contrary, he looks forward to the Singularity and urges us to get ready to merge with the super-AIs when this happens. He looks at AI super-intelligence as an opportunity for human augmentation and immortality. Here is a video interview in February 2017 where Kurzweil responds to fears by Hawking, Gates, and Musk about the rise of strong A.I.

Note Ray conceded the concerns are valid, but thinks they miss the point that AI will be us, not them, that humans will enhance themselves to super-intelligence level by integrating with AI – the Borg approach (our words, not his).

Getting back to the more mainstream defenses of super-intelligent AI, consider Oren Etzioni’s Ted Talk on this topic.

Oren Etzioni thinks AI has gotten a bad rap and is not an existential threat to the human race. As the video shows, however, even Etzioni is concerned about autonomous weapons and immediate economic impacts. He invited everyone to join him and advocate for the responsible use of AI.

Conclusion

The responsible use of AI is a common ground that we can all agree upon. We can build upon and explore that ground with others at many venues, including the new one I am trying to put together at AI-Ethics.com. Write me if you would like to be a part of that effort. Our first two projects are: (1) to research and prepare a scholarly paper of the many principles proposed for AI Ethics by other groups; and (2) put on a conference dedicated to dialogue on AI Ethics principles, not a debate. See AI-Ethics.com for more information on these two projects. Ultimately we hope to mediate model recommendations for consideration by other groups and regulatory bodies.

AI-Ethics.com is looking forward to working with non-lawyer technologists, scientists and others interested in AI ethics. We believe that success in this field depends on diversity. It has to be very interdisciplinary to succeed. Lawyers should be included in this work, but we should remain a minority. Diversity is key here. We will even allows AIs, but first they must pass a little test you may have heard of.  When it comes to something as important all this, all faces should be in the book, including all colors, races, sexes, nationalities, education, from all interested companies, institutions, foundations, governments, agencies, firms and teaching institutions around the globe. This is a human effort for a good AI future.

 

 



Predictive Coding 4.0 – Nine Key Points of Legal Document Review and an Updated Statement of Our Workflow – Part Five

October 9, 2016

predictive_coding_quality_triangleThis is the fifth installment of the article explaining the e-Discovery Team’s latest enhancements to electronic document review using Predictive Coding. Here are Parts OneTwoThree and Four. This series explains the nine insights behind the latest upgrade to version 4.0 and the slight revisions these insights triggered to the eight-step workflow. We have already covered five of the nine insights. In this installment we will cover the remaining four: GIGO & QC (Garbage In, Garbage Out) (Quality Control); SME (Subject Matter Expert); Method (for electronic document review); and, Software (for electronic document review). The last three: SME – Method – Software, are all parts of Quality Control.

GIGO & QC – Garbage In, Garbage Out & Quality Control

Garbage In, Garbage Out is one of the oldest sayings in the computer world. You put garbage into the computer and it will spit it back at you in spades. It is almost as true today as it was in the 1980s when it was first popularized. Smart technology that recognizes and corrects for some mistakes has tempered GIGO somewhat, but it still remains a controlling principle of computer usage.

garbage-in-garbage-out

The GIGO Wikipedia entry explains that:

GIGO in the field of computer science or information and communications technology refers to the fact that computers, since they operate by logical processes, will unquestioningly process unintended, even nonsensical, input data (“garbage in”) and produce undesired, often nonsensical, output (“garbage out”). … It was popular in the early days of computing, but applies even more today, when powerful computers can produce large amounts of erroneous information in a short time.

Wikipedia also pointed out an interesting new expansion of the GIGO Acronym, Garbage In, Gospel Out:

It is a sardonic comment on the tendency to put excessive trust in “computerized” data, and on the propensity for individuals to blindly accept what the computer says.

Now as to our insight: GIGO in electronic document review, especially review using predictive coding, is largely the result of human error on the part of the Subject Matter Expert. Of course, garbage can also be created by poor methods, where too many mistakes are made, and by poor software. But to really mess things up, you need a clueless SME. These same factors also create garbage (poor results) when used with any document review techniques. When the subject matter expert is not good, when he or she does not have a good grasp for what is relevant, and what is important for the case, then all methods fail. Keywords and active machine learning both depend on reliable attorney expertise. Quality control literally must start at the top of any electronic document review project. It must start with the SME.

Missed_target

If your attorney expert, your SME, has no clue, their head is essentially garbage. With that kind of bad input, you will inevitably get bad output. This happens with all usages of a computer, but especially when using predictive coding. The computer learns what you teach it. Teach it garbage and that is what it will learn. It will hit a target all right. Just not the right target. Documents will be produced, just not the ones needed to resolve the disputed issues. A poor SME makes too many mistakes and misses too many relevant documents because they do not know what is relevant and what is not.

Robot_BYTEA smart AI can correct for some human errors (perfection is not required). The algorithms can correct for some mistakes in consistency by an SME, and the rest of the review team, but not that many. In machine learning for document review the legal review robot now starts as a blank slate with no knowledge of the law or the case. They depend on the SME to teach them. Someday that may change. We may see smart robots who know the law and relevance, but we are not even near there yet. For now our robots are more like small children. They only know what you tell them, but they can spot inconsistencies in your message and they never forget.

Subject Matter Expert – SME

The predictive coding method can fail spectacularly with a poor expert, but so can keyword search. The converse of both propositions is also true. In all legal document review projects the SME needs to be an expert in scope of relevance, what is permitted discovery, what is relevant and what is not, what is important and what is not. They need to know the legal rules governing relevance backwards and forwards. They also need to have a clear understanding of the probative value of evidence in legal proceedings. This is what allows an attorney to know the scope of discoverable information.

relevance_scope_2016

If the attorney in charge does not understand the scope of discoverable information, does not understand probative value, then the odds of finding the documents important to a case are significantly diminished. You could look at a document with high probative value and not even know that it is relevant. This is exactly the concern of many requesting parties, that the responding party’s attorney will not understand relevance and discoverability the same way they do. That is why the first step in my recommended work flow is to Talk, which I also call Relevance Dialogues.

The kind of ESI communications with opposing counsel that are needed is not whining accusations or aggressive posturing. I will go into good talk versus bad talk in some detail when I explain the first step of our eight-step method. The point of the talking that should begin any document review project is to get a common understanding of scope of discoverable information. What is the exact scope of the request for production? Don’t agree the scope is proportionate? That’s fine. Agree to disagree and Talk some more, this time to the judge.

We have seen firsthand in the TREC experiments the damage  that can be done by a poor SME and no judge to keep them inline. Frankly, it has been something of a shock, or wake up call, as to the dangers of poor SME relevance calling. Most of the time I am quite lucky in my firm of super-specialists (all we do is employment law matters) to have terrific SMEs. But I have been a lawyer for a long time. I have seen some real losers in this capacity in the past 36 years. I myself have been a poor SME in some of the 2015 TREC experiments. An example that comes to mind is when I had to be the SME on the subject of CAPTCHA in a collection of forum messages by hackers. It ended up being on the job training. I saw for myself how little I could do to guide the project. Weak SMEs make bad leaders in the world of technology and law.

captcha

spoiled brat becomes an "adult"There are two basic ways that discovery SMEs fail. First, there are the kind who do not really know what they are talking about. They do not have expertise in the subject matter of the case, or, let’s be charitable, their expertise is insufficient. A bullshit artist makes a terrible SME. They may fool the client (and they often do), but they do not fool the judge or any real experts. The second kind of weak SMEs have some expertise, but they lack experience. In my old firm we used to call them baby lawyers. They have knowledge, but not wisdom. They lack the practical experience and skills that can only come from grappling with these relevance issues in many cases.

That is one reason why boutique law firms like my own do so well in today’s competitive environment. They have the knowledge and the wisdom that comes from specialization. They have seen it before and know what to do. Knowledge_Information_Wisdom

An SME with poor expertise has a very difficult time knowing if a document is relevant or not. For instance, a person not living in Florida might have a very different understanding than a Floridian of what non-native plants and animals threaten the Florida ecosystem. This was Topic 408 in TREC 2016 Total Recall Track. A native Floridian is in a better position to know the important invasive species, even ones like vines that have been in the state for over a hundred years. A non-expert with only limited information may not know, for instance, that Kudzo vines are an invasive plant from Japan and China. (They are also rumored to be the home of small, vicious Kudzo monkeys!) What is known for sure is that Kudzu, Pueraria montana, smothers all other vegetation around, including tall trees (shown below). A native Floridian hates Kudzo as much as they love Manatees.

kudzu

A person who has just visited Florida a few times would not know what a big deal Kudzo was in Florida during the Jeb Bush administration, especially in Northern Florida. (Still is.) They had probably never heard of it at all. They could see email with the term and have no idea what the email meant. It is obvious the native SME would know more, and thus be better positioned than a fake-SME, to determine Jeb Bush email relevance to non-native plants and animals that threaten the Florida ecosystem. By the way, all native Floridians especially hate pythons and a python eating one of our gators as shown below is an abomination.

python

Expertise is obviously needed for anyone to be a subject matter expert and know the difference between relevant and irrelevant. But there is more to it than information and knowledge. It also takes experience. It takes an attorney who has handled these kinds of cases many times before. Preferably they have tried a case like the one you are working on. They have seen the impact of this kind of evidence on judge and jury. An attorney with both theoretical knowledge and practical experience makes the best SME. Your ability to contribute subject matter expertise is limited when you have no practical experience. You might think certain ESI is helpful, when in fact, it is not; it has only weak probative value. A document might technically be relevant, but the SME lacks the experience and wisdom to know that matter is practically irrelevant anyway.

It goes without saying that any SME needs a good review team to back them up, to properly, consistently implement their decisions. In order for good leadership to be effective, there must also be good project management. Although this insight discussion features the role of the SME member of the review team, that is only because the importance of the SME was recently emphasized to us in our TREC research. In actuality all team members are important, not just the input from the top. Project management is critical, which is an insight already well-known to us and, we think, the entire industry.

Corrupt SMEs

Star_wars_emperor

Beware evil SMEs

Of course, no SME can be effective, no matter what their knowledge and experience, if they are not fair and honest. The SME must impartially seek and produce documents that are both pro and con. This is an ethics issue in all types of document review, not just predictive coding. In my experience corrupt SMEs are rare. But it does happen occasionally, especially when a corrupt client pressures their all too dependent attorneys. It helps to know the reputation for honesty of your opposing counsel. See: Five Tips to Avoid Costly Mistakes in Electronic Document Review Part 2 that contains my YouTube video, E-DISCOVERY ETHICS (below).

Also see: Lawyers Behaving Badly: Understanding Unprofessional Conduct in e-Discovery, 60 Mercer L. Rev. 983 (Spring 2009); Mancia v. Mayflower Begins a Pilgrimage to the New World of Cooperation, 10 Sedona Conf. J. 377 (2009 Supp.).

If I were a lawyer behaving badly in electronic document review, like for instance the Qualcomm lawyers did hiding thousands of highly relevant emails from Broadcom, I would not use predictive coding. If I wanted to not find evidence harmful to my case, I would use negotiated keyword search, the Go Fish kind. See Part Four of this series.

looking for droids in all the wrong places

I would also use linear review and throw an army of document review attorneys at it, with no one really knowing what the other was doing (or coding). I would subtly encourage project mismanagement. I would not pay attention. I would not supervise the rest of the team. I would not involve an AI entity,  i.w.- active machine learning. I would also not use an attorney with search expertise, nor would I use a national e-discovery vendor. I would throw a novice at the task and use a local or start-up vendor who would just do what they were told and not ask too many questions.

sorry_dave_ai

A corrupt hide-the-ball attorney would not want to use a predictive coding method like ours. They would not want the relevant documents produced or logged that disclose the training documents they used. This is true in any continuous training process, not just ours. We do not produce irrelevant documents, the law prevents that and protects our client’s privacy rights. But we do produce relevant documents, usually in phases, so you can see what the training documents are.

Star Wars Obi-WanA Darth Vader type hide-the-ball attorney would also want to avoid using a small, specialized, well-managed team of contract review lawyers to assist on a predictive coding project the review project. They would instead want to work with a large, distant army of contract lawyers. A small team of contract review attorneys cannot be brought into the con, especially if they are working for a good vendor. Even if you handicap them with a bad SME, and poor methods and software, they may still find a few of the damaging documents you do not want to produce. They may ask questions when they learn their coding has been changed from relevant to irrelevant. I am waiting for the next Qualcomm or Victor Stanley type case where a contract review lawyer blows the whistle on corrupt review practices. Qualcomm Inc. v. Broadcom Corp., No. 05-CV-1958-B(BLM) Doc. 593 (S.D. Cal. Aug. 6, 2007) (one honest low-level engineer testifying at trial blew the whistle on Qualcomm’s massive fraud to hide critical email evidence). I have heard stories from contract review attorneys, but the law provides them too little protection, and so far at least, they remain behind the scenes with horror stories.

One protection against both a corrupt SME, and SME with too little expertise and experience, is for the SME to be the attorney in charge of the trial of the case, or at least one who works closely with them so as to get their input when needed. The job of the SME is to know relevance. In the law that means you must know how the ultimate arbitrator of relevance will rule – the judge assigned to your case. They determine truth. An SME’s own personal opinion is important, but ultimately of secondary importance to that of the judge. For that reason a good SME will often vary on the side of over-expansive relevance because they know from history that is what the judge is likely to allow in this type of case.

Judges-Peck_Grimm_FacciolaThis is a key point. The judges, not the attorneys, ultimately decide on close relevance and related discoverability issues. The head trial attorney interfaces with the judge and opposing counsel, and should have the best handle on what is or is not relevant or discoverable. A good SME can predict the judge’s rulings and, even if not perfect, can gain the judicial guidance needed in an efficient manner.

If the judge detects unethical conduct by the attorneys before them, including the attorney signing the Rule 26(g) response, they can and should respond harshly to punish the attorneys. See eg: Victor Stanley, Inc. v. Creative Pipe, Inc., 269 F.R.D. 497, 506 (D. Md. 2010). The Darth Vader’s of the world can be defeated. I have done it many times with the help of the presiding judge. You can too. You can win even if they personally attack both you and the judge. Been through that too.

Three Kinds of SMEs: Best, Average & Bad

bullseye_arrow_hitWhen your project has a good SME, one with both high knowledge levels and experience, with wisdom from having been there before, and knowing the judge’s views, then your review project is likely to succeed. That means you can attain both high recall of the relevant documents and also high precision. You do not waste much time looking at irrelevant documents.

When an SME has only medium expertise or experience, or both, then the expert tends to err on the side of over-inclusion. They tend to call grey area documents relevant because they do not know they are unimportant. They may also not understand the new Federal Rules of Civil Procedure governing discoverability. Since they do not know, they err on the side of inclusion. True experts know and so tend to be more precise than rookies. The medium level SMEs may, with diligence, also attain high recall, but it takes them longer to get there. The precision is poor. That means wasted money reviewing documents of no value to the case, documents of only marginal relevance that would not survive any rational scrutiny of Rule 26(b)(1).

When the SME lacks knowledge and wisdom, then both recall and precision can be poor, even if the software and methods are otherwise excellent. A bad SME can ruin everything. They may miss most of the relevant documents and end up producing garbage without even knowing it. That is the fault of the person in charge of relevance, the SME, not the fault of predictive coding, nor the fault of the rest of the e-discovery review team.

relevance_targets

top_smeIf the SME assigned to a document review project, especially a project using active machine learning, is a high-quality SME, then they will have a clear grasp of relevance. They will know what types of documents the review team is looking for. They will understand the probative value of certain kids of documents in this particular case. Their judgments on Rule 26(b)(1) criteria as to discoverability will be consistent, well-balanced and in accord with that of the governing judge. They will instruct the whole team, including the machine, on what is true relevant, on what is discoverable and what is not. With this kind of top SME, if the software, methods, including project management, and rest of the review team are also good, then high recall and precision are very likely.

aver_smeIf the SME is just average, and is not sure about many grey area documents, then they will not have a clear grasp of relevance. It will be foggy at best. They will not know what types of documents the review team is looking for. SMEs like this think that any arrow that hits a target is relevant, not knowing that only the red circle in the center is truly relevant. They will not understand the probative value of certain kids of documents in this particular case. Their judgments on Rule 26(b)(1) criteria as to discoverability will not be perfectly consistent, and will end up either too broad or too narrow, and may not be in accord with that of the governing judge. They will instruct the whole team, including the machine, on what might be relevant and discoverable in an unfocused, vague, and somewhat inconsistent manner. With this kind of SME, if the software and methods, including project management, and rest of the review team are also good, and everyone is very diligent, high recall is still possible, but precision is unlikely. Still, the project will be unnecessarily expensive.

The bad SME has multiple possible targets in mind. They just search without really knowing what they are looking for. They will instruct the whole team, including the machine, on what might be relevant and discoverable in an confused, constantly shifting and often contradictory manner. Their obtuse explanations of relevance have little to do with the law, nor the case at hand. They probably have a very poor grasp of Rule 26(b)(1) Federal Rules of Civil Procedure. Their judgments on 26(b)(1) criteria as to discoverability, if any, will be inconsistent, imbalanced and sometimes irrational. This kind of SME probably does not even know the judge’s name, much less a history of their relevance rulings in this type of case. With this kind of SME, even if the software and methods are otherwise good, there is little chance that high recall or precision will be attained. An SME like this does not know when their search arrow has hit center of the target. In fact, it may hit the wrong target entirely. Their thought-world looks like this.

poor_sme

A document project governed by a bad SME runs a high risk of having to be redone because important information is missed. That can be a very costly disaster. Worse, a document important to the producing parties case can be missed and the case lost because of that error. In any event, the recall and precision will both be low. The costs will be high. The project will be confused and inefficient. Projects like this are hard to manage, no matter how good the rest of the team. In projects like this there is also a high risk that privileged documents will accidentally be produced. (There is always some risk of this in today’s high volume ESI world, even with a top-notch SME and review team. A Rule 502(d) Order should always be entered for the protection of all parties.)

Method and Software

The SME and his or her implementing team is just one part of the quality triangle. The other two are Method of electronic document review and Software used for electronic document review.

predictive_coding_quality_triangle-variation

Obviously the e-Discovery Team takes Method very seriously. That is one reason we are constantly tinkering with and improving our methods. We released the breakthrough Predictive Coding 3.0 last year, following 2015 TREC research, and this year, after TREC 2016, we released version 4.0. You could fairly say we are obsessed with the topic. We also focus on the importance of good project management and communications. No matter how good your SME, and how good your software, if your methods are poor, so too will your results in most of your projects. How you go about a document review project, how you manage it, is all-important to the quality of the end product, the production.

predictive_coding_4-0_webThe same holds true for software. For instance, if your software does not have active machine learning capacities, then it cannot do predictive coding. The method is beyond the reach of the software. End of story. The most popular software in the world right now for document review does not have that capacity. Hopefully that will change soon and I can stop talking around it.

Mr_EDREven among the software that has active machine learning, some are better than others. It is not my job to rank and compare software. I do not go around asking for demos and the opportunity to test other software. I am too busy for that. Everyone knows that I currently prefer to use EDR. It is the software by Kroll Ontrack that I use everyday. I am not paid to endorse them and I do not. (Unlike almost every other e-discovery commentator out there, no vendors pay me a dime.) I just share my current preference and pass along cost-savings to my clients.

I will just mention that the only other e-discovery vendor to participate with us at TREC is Catalyst. As most of my readers know, I am a fan of the founder and CEO, John Tredennick. There are several other vendors with good software too. Look around and be skeptical. But whatever you do, be sure the software you use is good. Even a great carpenter with the wrong tools cannot build a good house.

predictive_coding_quality_triangleOne thing I have found, that is just plain common sense, is that with good software and good methods, including good project management, you can overcome many weaknesses in SMEs, except for dishonesty or repeated, gross-negligence. The same holds true for all three corners of the quality triangle. Strength in one can, to a certain extent, make up for weaknesses in another.

To be continued …


%d