Predictive Coding 4.0 – Nine Key Points of Legal Document Review and an Updated Statement of Our Workflow – Part Two

September 18, 2016

Team_TRECIn Part One we announced the latest enhancements to our document review method, the upgrade to Predictive Coding 4.0. We explained the background that led to this upgrade – the TREC research and hundreds of projects we have done since our last upgrade a year ago. Millions have been spent to develop the software and methods we now use for Technology Assisted Review (TAR). As a result our TAR methods are more effective and simpler than ever.

The nine insights we will share are based on our experience and research. Some of our insights may be complicated, especially our lead insight on Active Machine Learning covered in this Part Two with our new description of ISTIntelligently Spaced Training. We consider IST the smart, human empowering alternative to CAL. If I am able to write these insights up here correctly, the obviousness of them should come through. They are all simple in essence. The insights and methods of Predictive Coding 4.0 document review are partially summarized in the chart below (which you are free to reproduce without edit).


1st of the Nine Insights: Active Machine Learning

Our method is Multimodal in that it uses all kinds of document search tools. Although we emphasize active machine learning, we do not rely on that method alone. Our method is also Hybrid in that we use both machine judgments and human (lawyer) judgments. Moreover, in our method the lawyer is always in charge. We may take our hand off the wheel and let the machine drive for a while, but under our versions of Predictive Coding, we watch carefully. We remain ready to take over at a moment’s notice. We do not rely on one brain to the exclusion of another. See eg. Why the ‘Google Car’ Has No Place in Legal Search (caution against over reliance on fully automated methods of active machine learning). Of course the converse is also true, we never just rely on our human brain alone. It has too many limitations. We enhance our brain with predictive coding algorithms. We add to our own natural intelligence with artificial intelligence. The perfect balance between the two, the Balanced Hybrid, is another of insights that we will discuss later.

Active Machine Learning is Predictive Coding – Passive Analytic Methods Are Not

Even though our methods are multimodal and hybrid, the primary search method we rely on is Active Machine Learning. The overall name of our method is, after all, Predictive Coding. And, as any information retrieval expert will tell you, predictive coding means active machine learning. That is the only true AI method. The passive type of machine learning that some vendors use under the name Analytics is NOT the same thing as Predictive Coding. These passive Analytics have been around for years and are far less powerful than active machine learning.

concept-searches-brainThese search methods, that used to be called Concept Search, were a big improvement upon relying on keyword search alone. I remember talking about concepts search techniques in reverent terms when I did my first Legal Search webinar in 2006 with Jason Baron and Professor Doug Oard. That same year, Kroll Ontrack bought one of the original developers and patent holders of concept search, Engenium. For a short time in 2006 and 2007 Kroll Ontrack was the only vendor to have these concept search tools. The founder of Engenium, David Chaplin came with the purchase, and became Kroll Ontrack’s VP of Advanced Search Technologies for three years. (Here is an interesting interview of Chaplin that discusses what he and Kroll Ontrack were doing with advanced search analytic-type tools when he left in 2009.)

search_globalBut search was hot and soon boutique search firms like, Clearwell, Cataphora, Content Analyst (the company recently purchased by popular newcomer, kCura), and other e-discovery vendors developed their own concept search tools. Again, they were all using passive machine learning. It was a big deal ten years ago. For a good description of these admittedly powerful, albeit now dated search tools, see the concise, well-written article by D4’s Tom Groom, The Three Groups of Discovery Analytics and When to Apply Them.

Search experts and information scientists know that active machine learning, also called supervised machine learning, was the next big step in search after concept searches, which are, in programming language, also known as passive or unsupervised machine learning. I am getting out of my area of expertise here, and so am unable go into any details, other than present the below instructional chart by Hackbright Academy that sets forth key difference between supervised learning (predictive coding) and unsupervised (analytics, aka concept search).


What I do know is that the bonafide active machine learning software in the market today all use either a form of Logistic Regression, including Kroll Ontrack, or SVM, which means Support Vector Machine.

e-Discovery Vendors Have Been Market Leaders in Active Machine Learning Software

Kroll_IRTAfter Kroll Ontrack absorbed the Engenium purchase, and its founder Chaplin completed his contract with Kroll Ontrack and moved on, Kroll Ontrack focused their efforts on the next big step, active machine learning, aka predictive coding. They have always been that kind of cutting edge company, especially when it comes to search, which is one reason they are one of my personal favorites. A few of the other, then leading e-discovery vendors did too, including especially Recommind and the Israeli based search company, Equivo. Do not get me wrong, the concept search methods, now being sold under the name of TAR Analytics, are powerful search tools. They are a part of our multimodal tool-kit and should be part of yours. But they are not predictive coding. They do not rank documents according to your external input, your supervision. They do not rely on human feedback. They group documents according to passive analytics of the data. It is automatic, unsupervised. These passive analytic algorithms can be good tools for efficient document review, but they not active machine learning and are nowhere near as powerful.


Search Software Ghosts

Many of the software companies that made the multi-million dollar investments necessary to go to the next step and build document review platforms with active machine learning algorithms have since been bought out by big-tech and repurposed out of the e-discovery market. They are the ghosts of legal search past. Clearwell was purchased by Symantec and has since disappeared. Autonomy was purchased by Hewlett Packard and has since disappeared. Equivio was purchased by Microsoft and has since disappeared. See e-Discovery Industry Reaction to Microsoft’s Offer to Purchase Equivio for $200 Million – Part One and Part Two. Recommind was recently purchased by OpenText and, although it is too early to tell for sure, may also soon disappear from e-Discovery.

Slightly outside of this pattern, but with the same ghosting result, e-discovery search company, Cataphora, was bought by Ernst & Young, and has since disappeared. The year after the acquisition, Ernst & Young added predictive coding features from Cataphora to its internal discovery services. At this point, all of the Big Four Accounting Firms, claim to have their own proprietary software with predictive coding. Along the same lines, at about the time of the Cataphora buy-out, consulting giant FTI purchased another e-discovery document review company, Ringtail Solutions (known for its petri dish like visualizations). Although not exactly ghosted by FTI from the e-discovery world after the purchase, they have been absorbed by the giant FTI.

microsoft_acquiresOutside of consulting/accountancy, in the general service e-discovery industry for lawyers, there are, at this point (late 2016) just a few document review platforms left that have real active machine learning. Some of the most popular ones left behind certainly do not. They only have passive learning analytics. Again, those are good features, but they are not active machine learning, one of the nine basic insights of Predictive Coding 4.0 and a key component of the e-Discovery Team’s document review capabilities.


The power of the advanced, active learning technologies that have been developed for e-discovery is the reason for all of these acquisitions by big-tech and the big-4 or 5. It is not just about wild overspending, although that may well have been the case for Hewlett Packard payment of $10.3 Billion to buy Autonomy. The ability to do AI-enhanced document search and review is a very valuable skill, one that will only increase in value as our data volumes continue to explode. The tools used for such document review are also quite valuable, both inside the legal profession and, as the ghostings prove, well beyond into big business. See e-Discovery Industry Reaction to Microsoft’s Offer to Purchase Equivio for $200 MillionPart Two.

The indisputable fact that so many big-tech companies have bought up the e-discovery companies with active machine learning software should tell you a lot. It is a testimony to the advanced technologies that the e-discovery industry has spawned. When it comes to advanced search and document retrieval, we in the e-discovery world are the best in the world my friends, primarily because we have (or can easily get) the best tools. Smile.


Search is king of our modern Information Age culture. See Information → Knowledge → Wisdom: Progression of Society in the Age of ComputersThe search for evidence to peacefully resolve disputes is, in my most biased opinion, the most important search of all. It sure beats selling sugar water. Without truth and justice all of the petty business quests for fame and fortune would crumble into anarchy, or worse, dictatorship.

With this background it is easy to understand why some of the e-discovery vendors left standing are not being completely candid about the capabilities of their document review software. (It is called puffing and is not illegal.) The industry is unregulated and, alas, most of our expert commentators are paid by vendors. They are not independent. As a result, many of the lawyers who have tried what they thought was predictive coding, and had disappointing results, have never really tried predictive coding at all. They have just used slightly updated concept search.

Ralph Losey with this "nobody read my blog" sad shirtAlternatively, some of the disappointed lawyers may have used one of the many now-ghosted vendor tools. They were all early version 1.0 type tools. For example, Clearwell’s active machine learning was only on the market for a few months with this feature before they were bought and ghosted by Symantec. (I think Jason Baron and I were the first people to see an almost completed demo of their product at a breakfast meeting a few months before it was released.) Recommind’s predictive coding software was well-developed at the time of their sell-out, but not its methods of use. Most of its customers can testify as to how difficult it is to operate. That is one reason that OpenText was able to buy them so cheaply, which, we now see, was part of their larger acquisition plan culminating in the purchase of Dell’s EMC document management software.

All software still using early methods, what we call version 1.0 and 2.0 methods based on control sets, are cumbersome and hard to operate, not just Recommind’s system. I explained this in my article last year, Predictive Coding 3.0. I also mentioned in this article that some vendors with predictive coding would only let you use predictive coding for search. It was, in effect, mono-modal. That is also a mistake. All types of search must be used – multimodal – for the predictive coding type of search to work efficiently and effectively. More on that point later.

Maura Grossman Also Blows the Whistle on Ineffective “TAR tools”

Maura Grossman aka "Mr. Grossman" to her email friends

Maura Grossman, who is now an independent expert in this field, made many of these same points in a recent interview with Artificial Lawyer, a periodical dedicated to AI and the Law. AI and the Future of E-Discovery: AL Interview with Maura Grossman (Sept. 16, 2016). When asked about the viability of the “over 200 businesses offering e-discovery services” Maura said, among other things:

In the long run, I am not sure that the market can support so many e-discovery providers …

… many vendors and service providers were quick to label their existing software solutions as “TAR,” without providing any evidence that they were effective or efficient. Many overpromised, overcharged, and underdelivered. Sadly, the net result was a hype cycle with its peak of inflated expectations and its trough of disillusionment. E-discovery is still far too inefficient and costly, either because ineffective so-called “TAR tools” are being used, or because, having observed the ineffectiveness of these tools, consumers have reverted back to the stone-age methods of keyword culling and manual review.

caveman lawyerNow that Maura is no longer with the conservative law firm of Wachtell Lipton, she has more freedom to speak her mind about caveman lawyers. It is refreshing and, as you can see, echoes much of what I have been saying. But wait, there is still more that you need to hear from the interview of new Professor Grossman:

It is difficult to know how often TAR is used given confusion over what “TAR” is (and is not), and inconsistencies in the results of published surveys. As I noted earlier, “Predictive Coding”—a term which actually pre-dates TAR—and TAR itself have been oversold. Many of the commercial offerings are nowhere near state of the art; with the unfortunate consequence that consumers have generalised their poor experiences (e.g., excessive complexity, poor effectiveness and efficiency, high cost) to all forms of TAR. In my opinion, these disappointing experiences, among other things, have impeded the adoption of this technology for e-discovery. …

ulNot all products with a “TAR” label are equally effective or efficient. There is no Consumer Reports or Underwriters Laboratories (“UL”) that evaluates TAR systems. Users should not assume that a so-called “market leading” vendor’s tool will necessarily be satisfactory, and if they try one TAR tool and find it to be unsatisfactory, they should keep evaluating tools until they find one that works well. To evaluate a tool, users can try it on a dataset that they have previously reviewed, or on a public dataset that has previously been labelled; for example, one of the datasets prepared for the TREC 2015 or 2016 Total Recall tracks. …

She was then asked by the Artificial Lawyer interviewer (name never identified), which is apparently based in the UK, another popular question:

As is often the case, many lawyers are fearful about any new technology that they don’t understand. There has already been some debate in the UK about the ‘black box’ effect, i.e., barristers not knowing how their predictive coding process actually worked. But does it really matter if a lawyer can’t understand how algorithms work?

Maura_Goog_GlassesThe following is an excerpt of Maura’s answer. Suggest you consult the full article for a complete picture. AI and the Future of E-Discovery: AL Interview with Maura Grossman (Sept. 16, 2016). I am not sure whether she put on her Google Glasses to answer (probably not), but anyway, I rather like it.

Many TAR offerings have a long way to go in achieving predictability, reliability, and comprehensibility. But, the truth that many attorneys fail to acknowledge is that so do most non-TAR offerings, including the brains of the little black boxes we call contract attorneys or junior associates. It is really hard to predict how any reviewer will code a document, or whether a keyword search will do an effective job of finding substantially all relevant documents. But we are familiar with these older approaches (and we think we understand their mechanisms), so we tend to be lulled into overlooking their limitations.

The brains of the little black boxes we call contract attorneys or junior associates. So true. We will go into that more throughly in our discussion of the GIGO & QC insight.

Recent Team Insights Into Active Machine Learning

To summarize what I have said so far, in the field of legal search, only active machine learning:

  • effectively enhances human intelligence with artificial intelligence;
  • qualifies for the term Predictive Coding.

I want to close on this discussion of active machine learning with one more insight. This one is slightly technical, and again, if I explain it correctly, should seem perfectly obvious. It is certainly not new, and most search experts will already know this to some degree. Still, even for them, there may some nuances to this insight that they have not thought of. It can be summarized as follows: active machine learning should have a double feedback loop with active monitoring by the attorney trainers.


feedback_loopsActive machine learning should create feedback for both the algorithm (the data classified) AND the human managing the training. Both should learn, not just the robot. They should, so to speak, be friends. They should get to know each other

Many predictive coding methods that I have read about, or heard described, including how I first used active machine learning, did not sufficiently include the human trainer in the feedback loop.  They were static types of training using single a feedback loop. These methods are, so to speak, very stand-offish, aloof. Under these methods the attorney trainer does not even try to understand what is going on with the robot. The information flow was one-way, from attorney to machine.

Mr_EDRAs I grew more experienced with the EDR software I started to realize that it is possible to start to understand, at least a little, what the black box is doing. Logistic based AI is a foreign intelligence, but it is intelligence. After a while you start to understand it. So although I started just using one-sided machine training, I slowly gained the ability to read how EDR was learning. I then added another dimension, another feedback loop that was very interesting one indeed. Now I not only trained and provided feedback to the AI as to whether the predictions of relevance were correct, or not, but I also received training from the AI as to how well, or not, it was learning. That in turn led to the humorous personification of the Kroll Ontrack software that we now call Mr. EDR. See When we reached this level, machine training became a fully active, two-way process.

We now understand that to fully supervise a predictive coding process you to have a good understanding of what is happening. How else can you supervise it? You do not have to know exactly how the engine works, but you at least need to know how fast it is going. You need a speedometer. You also need to pay attention to how the engine is operating, whether it is over-heating, needs oil or gas, etc. The same holds true to teaching humans. Their brains are indeed mysterious black boxes. You do not need to know exactly how each student’s brain works in order to teach them. You find out if your teaching is getting through by questions.

For us supervised learning means that the human attorney has an active role in the process. A role where the attorney trainer learns by observing the trainee, the AI in creation. I want to know as much as possible, so long as it does not slow me down significantly.

In other methods of using predictive coding that we have used or seen described the only role of the human trainer is to say yes or no as to the relevance of a document. The decision as to what documents to select for training has already been predetermined. Typically it is the highest ranked documents, but sometimes also some mid-ranked “uncertain documents” or some “random documents” are added in the mix. The attorney
has no say in what documents to look at. They are all fed to him or her according to predetermined rules. These decision making rules are set in ralph_boredadvance and do not change. These active machine learning methods work, but they are slow, and less precise, not to mention boring as hell.

The recall of these single-loop passive supervision methods may also not be as good. The jury is still out on that question. We are trying to run experiments on that now, although it can be hard to stop yawning. See an earlier experiment on this topic testing the single loop teaching method of random selection: Borg Challenge: Report of my experimental review of 699,082 Enron documents using a semi-automated monomodal methodology.

These mere yes or no, limited participation methods are hybrid Man-Machine methods, but, in our opinion, they are imbalanced towards the Machine. (Again, more on the question of Hybrid Balance will be covered in the next installment of this article.) This single versus dual feedback approach seems to be the basic idea behind the Double Loop Learning approach to human education depicted in the diagram below. Also see Graham Attwell, Double Loop Learning and Learning Analytics (Pontydysgu, May 4, 2016).


To quote Wikipedia:

The double loop learning system entails the modification of goals or decision-making rules in the light of experience. The first loop uses the goals or decision-making rules, the second loop enables their modification, hence “double-loop.” …

Double-loop learning is contrasted with “single-loop learning”: the repeated attempt at the same problem, with no variation of method and without ever questioning the goal. …

Double-loop learning is used when it is necessary to change the mental model on which a decision depends. Unlike single loops, this model includes a shift in understanding, from simple and static to broader and more dynamic, such as taking into account the changes in the surroundings and the need for expression changes in mental models.


The method of active machine learning that we use in Predictive Coding 4.0 is a type of double loop learning system. As such it is ideal for legal search, which is inherently ad hoc, where even the understanding of relevance evolves as the project develops. As Maura noted near the end of the Artificial Lawyer interview:

… e-discovery tends to be more ad hoc, in that the criteria applied are typically very different for every review effort, so each review generally begins from a nearly zero knowledge base.

The driving impetus behind our double feedback look system is to allow for training document selection to vary according to the circumstances encountered. Attorneys select documents for training and then observe how these documents impact the AI’s overall ranking of the documents. Based on this information decisions are then made by the attorney as to which documents to next submit for training. A single fixed mental model is not used, such as only submitting the ten highest ranked documents for training.

The human stays involved and engaged and selects the next documents to add to the training based on what she sees. This makes the whole process much more interesting. For example, if I find a group of relevant spreadsheets by some other means, such as a keyword search, then, when I add these document to the training, I observe how these documents impact the overall ranking of the dataset. For instance, did this training result in an increase of relevance ranking of other spreadsheets? Was the increase nominal or major? How did it impact the ranking of other documents? For instance, were emails with a lot of numbers in them suddenly much higher ranked? Overall, was this training effective? Were the documents in fact relevant as predicted that moved up in rank to the top, or near top of probable relevance? What was the precision rate like for these documents? Does the AI now have a good understanding of relevance of spreadsheets, or need more training on that type of document? Should we focus our search on other kinds of documents?

You see all kinds of variations on that. If the spreadsheet understanding (ranking) is good, how does it compare to its understanding (correct ranking) of Word Docs or emails? Where should I next focus my multimodal searches? What documents should I next assign to my reviewers to read and make a relevancy determination? These kind of considerations keep the search interesting, fun even. Work as play is the best kind. Typically we simply assign the documents for attorney review that have the highest ranking (which is the essence of what Grossman and Cormack call CAL), but not always. We are flexible. We, the human attorneys, are the second positive feedback loop.

EDR_lookWe like to remain in charge of teaching the classifier, the AI. We do not just turn it over to the classifier to teach itself. Although sometimes, when we are out of ideas and are not sure what to do next, we will do exactly that. We will turn over to the computer the decision of what documents to review next. We just go with his top predictions and use those documents to train. Mr. EDR has come through for us many times when we have done that. But this is more of an exception, than the rule. After all, the classifier is a tabula rasa. As Maura put it: each review generally begins from a nearly zero knowledge base. Before the training starts, it knows nothing about document relevance. The computer does not come with built-in knowledge of the law or relevance. You know what you are looking for. You know what is relevant, even if you do not know how to find it, or even whether it exists at all. The computer does not know what you are looking for, aside from what you have told it by your yes-no judgments on particular documents. But, after you teach it, it knows how to find more documents that probably have the same meaning.

raised_handsBy observation you can see for yourself, first hand, how your training is working, or not working. It is like a teacher talking to their students to find out what they learned from the last assigned reading materials. You may be surprised by how much, or how little they learned. If the last approach did not work, you change the approach. That is double-loop learning. In that sense our active monitoring approach it is like continuous dialogue. You learn how and if the AI is learning. This in turn helps you to plan your next lessons. What has the student learned? Where does the AI need more help to understand the conception of relevance that you are trying to teach it.

Only_Humans_Need_ApplyThis monitoring of the AI’s learning is one of the most interesting aspects of active machine learning. It is also a great opportunity for human creativity and value. The inevitable advance of AI in the law can mean more jobs for lawyers overall, but only for those able step up and change their methods. The lawyers able to play the second loop game of active machine learning will have plenty of employment opportunities. See eg. Thomas H. Davenport, Julia Kirby, Only Humans Need Apply: Winners and Losers in the Age of Smart Machines (Harper 2016).

Going down into the weeds a little bit more, our active monitoring dual feedback approach means that when we use Kroll Ontrack’s EDR software, we adjust the settings so that new learning sessions are not created automatically. They only run when and if we click on the Initiate Session button shown in the EDR screenshot below (arrow and words were added). We do not want the training to go on continuously in the background (typically meaning at periodic intervals of every thirty minutes or so.) We only want the learning sessions to occur when we say so. In that way we can know exactly what documents EDR is training on during a session. Then, when that training session is complete, we can see how the input of those documents has impacted the overall data ranking.  For instance, are there now more documents in the 90% or higher probable relevance category and if so, how many? The picture below is of a completed TREC project. The probability rankings are on the far left with the number of documents shown in the adjacent column. Most of the documents in the 290,099 collection of Bush email were in the 0-5% probable relevant ranking not included in the screen shot.


This means that the e-Discovery Team’s active learning is not continuous, in the sense of always training. It is instead intelligently spaced. That is an essential aspect of our Balanced Hybrid approach to electronic document review. The machine training only begins when we click on the “Initiate Session” button in EDR that the arrow points to. It is only continuous in the sense that the training continues until all human review is completed. The spaced training, in the sense of staggered  in time, is itself an ongoing process until the production is completed. We call this Intelligently Spaced Training or IST. Such ongoing training improves efficiency and precision, and also improves Hybrid human-machine communications. Thus, in our team’s opinion, IST is a better process of electronic document review than training automatically without human participation, the so-called CAL approach promoted (and recently trademarked) by search experts and professors, Maura Grossman and Gordon Cormack.


Exactly how we space out the timing of training in IST is a little more difficult to describe without going into the particulars of a case. A full, detailed description would require the reader to have intimate knowledge of the EDR software. Our IST process is, however, software neutral. You can follow the IST dual feedback method of active machine learning with any document review software that has active machine learning capacities and also allows you to decide when to initiate a training session. (By the way, a training session is the same thing as a learning session, but we like to say training, not learning, as that takes the human perspective and we are pro-human!) You cannot do that if the training is literally continuous and cannot be halted while you input a new batch of relevance determined documents for training.

The details of IST, such as when to initiate a training session, and what human coded documents to select next for training, is an ad hoc process. It depends on the data itself, the issues involved in the case, the progress made, the stage of the review project and time factors. This is the kind of thing you learn by doing. It is not rocket science, but it does help keep the project interesting. Hire one of our team members to guide your next review project and you will see it in action. It is easier than it sounds. With experience Hybrid Multimodal IST becomes an intuitive process, much like riding a bicycle.

ralph_trecTo summarize, active machine learning should be a dual feedback process with double-loop learning. The training should continue throughout a project, but it should be spaced in time so that you can actively monitor the progress, what we call IST. The software should learn from the trainer, of course, but the trainer should also learn from the software. This requires active monitoring by the teacher who reacts to what he or she sees and adjusts the training accordingly so as to maximize recall and precision.

This is really nothing more than a common sense approach to teaching. No teacher who just mails in their lessons, and does not pay attention to the students, is ever going to be effective. The same is true for active machine learning. That’s the essence of the insight. Simple really.

Next, in Part Three, I will address the related insights of Balanced Hybrid.

To be Continued …

e-Discovery Team’s Best Practices Education Program

May 8, 2016


EDBP                   Mr.EDR         Predictive Coding 3.0
59 TAR Articles
Doc Review  Videos



e-Discovery Team Training

Information → Knowledge → Wisdom

Ralph_4-25-16Education is the clearest path from Information to Knowledge in all fields of contemporary culture, including electronic discovery. The above links take you to the key components of the best-practices teaching program I have been working on since 2006. It is my hope that these education programs will help move the Law out of the dangerous information flood, where it is now drowning, to a safer refuge of knowledge. Information → Knowledge → Wisdom: Progression of Society in the Age of Computers; and How The 12 Predictions Are Doing That We Made In “Information → Knowledge → Wisdom.” For more of my thoughts on e-discovery education, see the e-Discovery Team School Page.

justice_guage_negligenceThe best practices and general educational curriculum that I have developed over the years focuses on the legal services provided by attorneys. The non-legal, engineering and project management practices of e-discovery vendors are only collaterally mentioned. They are important too, but students have the EDRM and other commercial organizations and certifications for that. Vendors are part of any e-Discovery Team, but the programs I have developed are intended for law firms and corporate law departments.

LIFE_magazine_Losey_acceleratesThe e-Discovery Team program, both general educational and legal best-practices, is online and available 24/7. It uses lots of imagination, creative mixes, symbols, photos, hyperlinks, interactive comments, polls, tweets, posts, news, charts, drawings, videos, video lectures, slide lectures, video skits, video slide shows, music, animations, cartoons, humor, stories, cultural themes and analogies, inside baseball references, rants, opinions, bad jokes, questions, homework assignments, word-clouds, links for further research, a touch of math, and every lawyer’s favorite tools: words (lots of them), logic, arguments, case law and precedent.

All of this to try to take the e-Discovery Team approach from just information to knowledge →. In spite of these efforts, most of the legal community still does not know e-discovery very well. What they do know is often misinformation. Scenes like the following in a law firm lit-support department are all too common.

supervising-tipsThe e-Discovery Team’s education program has an emphasis on document review. That is because the fees for lawyers reviewing documents is by far the most expensive part of e-discovery, even when contract lawyers are used. The lawyer review fees, and review supervision fees, including SME fees, have always been much more costly than all vendor costs and expenses put together. Still, the latest AI technologies, especially active machine learning using our Predictive Coding 3.0 methods, are now making it possible to significantly reduce review fees. We believe this is a critical application of best practices. The three steps we identify for this area in the EDBP chart are shown in green, to signify money. The reference to C.A. Review is to Computer Assisted Review or CAR, using our Hybrid Multimodal methods.



Predictive Coding 3.0 Hybrid Multimodal Document Search and Review

Control-SetsOur new version 3.0 techniques for predictive coding makes it far easier than ever before to include AI in a document review project. The secret control set has been eliminated, so too has the seed set and SMEs wasting their time reviewing random samples of mostly irrelevant junk. It is a much simpler technique now, although we still call it Hybrid Multimodal.

robot-friendHybrid is a reference to the Man/Machine interactive nature of our methods. A skilled attorney uses a type of continuous active learning to train an AI to help them to find the documents they are looking for. This Hybrid method greatly augments the speed and accuracy of the human attorneys in charge. This leads to cost savings and improved recall. A lawyer with an AI helper at their side is far more effective than lawyers working on their own. This means that every e-discovery team today could use a robot like Kroll Ontrack’s Mr. EDR to help them to do document review.

Search_pyramidMultimodal is a reference to the use of a variety of search methods to find target documents, including, but not limited to, predictive coding type ranked searches. We encourage humans in the loop running a variety of searches of their own invention, especially at the beginning of a project. This always makes for a quick start in finding relevant and hot documents. Why the ‘Google Car’ Has No Place in Legal Search. The multimodal approach also makes for precise, efficient reviews with broad scope. The latest active machine learning software when fully integrated with a full suite of other search tools is attaining higher levels of recall than ever before. That is one reason Why I Love Predictive Coding.

Mr_EDRI have found that Kroll Ontrack’s EDR software is ideally suited for these Hybrid, Multimodal techniques. Try using it on your next large project and see for yourself. The Kroll Ontrack consultant specialists in predictive coding, Jim and Tony, have been trained in this method (and many others). They are well qualified to assist you in every step of the way and their rates are reasonable. With you calling the shots on relevancy, they can do most of the search work for you and still save your client’s money. If the matter is big and important enough, then, if I have a time opening, and it clears my firm’s conflicts, I can also be brought in for a full turn-key operation. Whether you want to include extra time for training your best experts is your option, but our preference.



Embrace e-Discovery Team Education to Escape Information Overload


Spoliation Sanctions: The Tide is Turning, Goliath is Smiling

July 29, 2013


Since e-discovery law first began in the nineteen-nineties the problem of spoliation and sanctions has primarily been that of large corporations. In the classic David and Goliath scenario, typical for instance in employment litigation, the little David plaintiffs had a powerful slingshot to win cases: spoliation and sanctions. The large corporate defendants had zillions of bits of ESI, the little guy David requesting data had little or no bits. This imbalance of burden allowed the clever plaintiff’s counsel to win a case, not on the merits of their client’s complaint, but on the failures of the defendant corporation, the failures to lock down and secure all potentially relevant ESI from destruction. If the little guy could prove intentional destruction by one of the key players for the corporate defendant, then it was game over for the defendant. The famous Zubulake saga by Judge Scheindlin is an employment law case that proved the point. After Zubulake very few defendants would even risk going to trial, like UBS Warburg did in Zubulake. Instead, all too often, they just paid too much to settle the case.

Electronic discovery became the great equalizer in the seemingly never-ending battle between the plaintiffs Bar and the defense Bar. Even if a plaintiff could not prove spoliation, there was always an ESI cloud hanging over defense counsel’s head. Did their client preserve everything they needed to? How much was enough?  How were they going to respond to plaintiff’s request to produce. After while, by around 2010, things got so bad that just the threat of a production request was enough to force inflated settlements in some cases. Clever plaintiff’s counsel did not even have to get into spoliation or sanctions motion practice, all they had to do was get an order compelling production. The costs to search and produce might very well exceed the cost to settle the case. Yes, little David was smiling. All he had to do was take his e-discovery slingshot out of  his pocket and the allegedly big bad Goliath might run away.

Continuing Legal Education

Many e-discovery specialists, especially those of us who knew the Goliaths, and knew they were not as bad as the propaganda would suggest, were upset by all the cases we saw settling without regard to merits. Many defense counsel, as I was until 2006, really hated to see claims paid that they knew were bogus. It just did not seem right, even though it made economic sense to insurance adjusters. That was part of the reason that in 2006 I gave up my regular, admittedly defense oriented practice (although I always did some plaintiff’s work too when possible), and only worked on electronic discovery. Like many other tech-oriented attorneys I was focused on trying to level the playing field. I wanted fair trials to come back. I wanted Goliath to clean up his act so that would be possible. I wanted to teach Goliath how to preserve, search, and produce in a fair and proportional manner so that he would not have to shy away from David’s e-discovery slingshot.

Yes. I wanted Goliath to reform, to stop all destruction of ESI, to preserve in a prudent manner. So did many others attorneys moving into the e-discovery field from both sides. We all wanted cases to be decided on the merits. We all wanted important evidence to be preserved, not destroyed, either intentionally or by negligence. The answer then and now seemed to be a combination of education and technology. Most attorneys who had developed knowledge and skills gave of their time to present CLEs that would help other attorneys to preserve and protect, to search and serve . CLEs were a good way to network and improve everyone’s skills.

InnovateThese education efforts have been going on for decades now. I estimate that tens of thousands of e-discovery CLE have been presented by now, most by a relatively small cadre of dedicated judges, lawyers and techs. This effort has now even become a second-generation effort, with my son, Adam Losey, and his non-profit foundation, IT-Lex, putting on its first CLE in Florida on October 17 and 18, 2013. It’s called Innovate, and will have many of the top speakers and educators in a state-of-the-art event. (I suggest you sign up soon, like the Sedona Conference, attendance at Innovate will be capped.)

Why Goliath Is Smiling?

Bruce_Lee_smilingAfter all of these years of e-discovery anguish, Goliath is finally starting to smile. There are two reasons. First, all of this hard work at education and training of the holders of too-much-information is starting to pay off. Goliath is learning how to save evidence. The many thousands of CLEs on e-discovery given over the past fifteen years are starting to bear fruit. Even though corporations have more data than ever before, and more kinds of ESI, they are getting their act together. More and more of them have dedicated internal e-discovery teams, a move I have been pushing in this e-discovery team blog since 2006. Just as important, large organizations have now learned, many of them the hard way, but they learned, that they need to hire lawyers who are competent in e-discovery. They have learned that competence in litigation alone is not adequate. A new class of e-discovery lawyers has emerged, and they are serving their clients well.

The second reason Goliath is smiling is that little David is not so little any more. Frankly, he has become quite pudgy, bloated with ESI. Yes, even the little guys today have more ESI with potential evidence than they know what to do with, much less know how to preserve, search and produce. Also, it turns out that David is not always the honest boy scout we had been led to believe. The little guys are now often the ones out there spoliating away, thinking that they can get away with it, just like Laura Zubulake’s supervisor a decade ago. It is all just people after all, and some people are not that honest. If they are not stopped by assertive counsel, they will try to hide the bad stuff.

The plaintiffs bar is now starting to finds itself in the same position the defense bar was in a few years ago. From my front line position in the employment law practice, where small versus big is the norm, I think the tide has now turned. It is the plaintiffs now who cannot seem to save, find, or produce their ESI. It is the plaintiff’s now who are losing cases, in spite of the merits, based on their intentional or negligent destruction of evidence. It is the plaintiffs who are getting bad or no advice from their attorneys about preservation. It is the plaintiffs who must now settle because they could not or cannot find and produce all of the evidence responsive to defendants requests, all of the Facebook posts, tweets, texts, and email. The tide has turned.

riverview3I offer two recent examples of this, one from my home court in Orlando Florida, and another from Alabama. In the Orlando case a former employee was caught throwing her computer into the river. Simon Property Group, Inc. v. Lauria, 2012 U.S. Dist. LEXIS 184638 (M.D. Fla. 2012). Good defense counsel tactics uncovered this fraud, a fraud that, in my opinion, could have been prevented by better advice to plaintiff from her legal counsel. For details on this case, look at the other blog I edit and write under my law firm’s banner, eDiscoveryLawToday. The article is titled Fishing Expedition Discovers Laptop Cast into Indian River.

BBQ_ComputerIn the other case out of Alabama the employee claiming reverse discrimination decided to get rid of her computer after it looked like a forensic examination would be ordered. Evans v. Mobile Cnty. Health Dep’t, 2012 U.S. Dist. LEXIS 8530 (S.D. Ala. 2012). Rather than throw it in the river, the plaintiff here burned her computer in her backyard. Once again, in my opinion plaintiff’s counsel was asleep at the wheel, or worse. I wrote an article on this one too: Plaintiff’s Backyard Summer BBQ of Her Computer Leads to Sanctions. The truth is always stranger than fiction. Read the articles and cases for yourself to see what is going on in litigation today. The tide is changing. E-discovery is starting to be some defense counsel’s best friend.

Boomerang Effect

boomerangElectronic discovery specialist have explained for years now to attorneys who attend e-discovery CLEs what I call the boomerang effect. That you cannot take action to enforce e-discovery, to compel and to sanction, unless your own house is in order. You may throw the sanctions boomerang at the plaintiff only to have it come back at you. Just ask Apple in the Samsung patent case last years where both sides ended up with spoliation sanctions and countervailing adverse inferences. Apple v. SamsungU.S.D.C., N.D. Ca., Case No.: 11-CV-01846-LHK, Document 1894, Filed 08/21/12.

Awareness of the boomerang effect is one way we motivated and cajoled attorneys to get their clients to take preservation of ESI more seriously. If you do not want the other side to get away with breaking the rules, than you need to be very sure you are following the rules yourself. In the case of large organizations this mean spending money on people, e-discovery teams, and technology. 

The good new is this instruction has finally worked. It has taken over a decade, but the attendees of these many CLEs, including thousands of in-house counsel, have heard and have taken action. Many corporations have now made substantial investments. Even though their ESI is still growing at an exponential rate, and is still disproportional to David’s, they have their act together. Their e-discovery teams can save, find and produce, thanks to the new technologies of the vendors, and the new skills of many lawyers.

Where Are All The Plaintiffs Counsel?

Magritte_detailI have always been willing to teach David’s attorneys too, and so have others from both sides of the bar, but most plaintiffs counsel have not cared about e-discovery. In my view the plaintiffs’ bar has not been trying as hard as the defense bar. Now, as a whole, they have fallen way behind.

Plaintiff’s counsel have always been a no-show at e-discovery education events, despite the fact that several of the country’s top experts are openly plaintiff. You have only to think of Bill Butterfield or  Ariana  J. Tadler. I admit that I was once openly defense, but have tried to go neutral since 2006. Unlike some others in the field who have gone neutral, or attempted to do so, I am aware that I may still be somewhat biased. (Maybe even more than somewhat, but at least I am working on it.) With this perhaps tempering bias in mind, I cannot help but think that the plaintiffs Bar has created this problem, where the pendulum has now begun to swing against them. They have done so by their own laziness and lack of foresight.

Harsh words I know, and of course there are many exceptions (for instance a large plaintiffs firm in my locale, Morgan & Morgan, insisted that one of their new hires take my course at the University of Florida), but how else do you explain that at every single CLE event I have ever attended since 2006, over 90% of the attendees have been defense oriented. The truth is, when I ask the question at CLEs: how many of you primarily do plaintiffs work; usually only one of two hands go up, sometimes none.

I have talked to other CLE presenters and  sponsors about this. They see the same thing. The consensus is that plaintiffs counsel rarely attend e-discovery educational CLEs, even when they are free. Of course, as mentioned, there are exceptions to this general rule; some plaintiff firms understand very well the importance of e-discovery and continued legal education. They are privately chagrined that their colleagues do not seem to be listening to them. The truth is, the vast majority of plaintiffs counsel just do not get it.

Not yet anyway. After they start losing cases because of sanctions, like corporate Goliaths did just a few years ago, the rest of the plaintiff bar will wake up. More personal sanctions of plaintiffs counsel will also help them to wake up. Then they will start insisting that their individual clients do the right thing, that small plaintiffs preserve all of the evidence, not just the ESI that supports their position. I predict that even after a series of cases like this, and even after the plaintiffs bar changes its attitude and starts to take e-discovery education seriously, that it will take some time to improve the skills of that many lawyers. (There are more plaintiffs lawyers than defense lawyers, including the solos and part-timers.) In the meantime, defense counsel may finally have the edge for a while. So too will the few plaintiff’s counsel that already get it. But is this a good thing? Does it serve justice?


Ralph Losey 2012 abstractThe sudden emergence of the common Man as a holder of too-much-information has caught most plaintiffs attorneys by surprise. They are unprepared and vulnerable to defense counsel boomerangs. Some defense counsel may celebrate and gloat about this turn around, but I do not. I am concerned about the big picture. I see danger in this imbalance of legal skills. I see danger to the rights of our citizens to obtain equal justice under the law. I also see danger in allowing destruction of evidence by any party to go unchecked. There is no easy answer.

Sanctions for spoliation is no longer a weapon available only to a few sophisticated counsel in the plaintiffs bar. It is now also available to many, if not most large corporations and their attorneys. There are a growing number of highly trained defense attorneys who are no longer afraid of the boomerang effect. They are free to fight the fraudulent or negligent withholding of evidence on the part of those who sue them. Many large organizations today have their house in order and are preserving all of the evidence that proportionality demands. This is good news for those concerned about e-discovery extortion and settlements forced upon defendants. I certainly celebrate this. It is a goal I have long worked towards. But are we now in danger of excess in the other direction due to the general lack of skills by the plaintiffs bar?

Cases should be decided on the merits, on the evidence.  I do not want plaintiffs to have to settle for too cheap, or lose their meritorious case, just because their attorneys did not know e-discovery, did not know they should make sure their clients preserve evidence, did not know how to make a production. It is not good for our system of justice when this happens to any party, defendants or plaintiffs. That is part of the big picture. But the other part is that the truth must always prevail. Plaintiffs must not be allowed to destroy evidence any more than defendants. If evidence is destroyed by anyone, it is much harder to know the truth. If evidence is destroyed or altered, then the responsible party should be sanctioned. So too should their attorneys, if they are to blame. There is no one-free-bite for spoliation, be they defendant or plaintiff.

sanctionsSanctions should be imposed regardless of whether it is the plaintiff or defendant who is guilty. Sanctions of erring parties, and their attorneys, can have a strong deterrent effect. It can motivate all litigants and their attorneys to take the time and effort needed to marshal and preserve evidence. It can also motivate attorneys to attend CLEs and really learn and practice these new skills. Preservation of evidence is a serious matter for both sides of the “v.” Punishment by sanctions is a strong deterrent of both intentional bad faith actions, and negligent actions. In either event, the truth has been lost, justice has been tainted.

Fear of sanctions is an important motivator. That is one reason I suggest the bar be very careful about any amendment to the rules that weakens a judge’s discretion to sanction a party for deletion or alteration of evidence. The defense bar especially should be careful about what it asks for in the way of rules reform. They should think twice about pushing new rules that make it harder for judges to punish litigants for wrongful conduct, especially in the area of ESI preservation. They may end up hoisted by their own petard.

The Simon Property Group and Evans cases were easy facts, but many other cases of plaintiff spoliation are not. It was clear the plaintiffs in these cases did the deeds themselves and did them in bad faith. The harsh sanctions were appropriate. The court’s discretion in this area  was properly exercised.

I continue to believe that this judicial discretion to punish where appropriate, either the litigant, or their attorney, or both, should be preserved and strengthened, not hindered. I continue to prefer legal education and technology as a solution to the preservation problem, not a weakened judiciary. We have a delicate balance here, one that requires wisdom and good judgment by our judiciary. Too many new rules may upset the balance we now have.

%d bloggers like this: