Return of the Robots!

June 29, 2014

transformers_extinctionTired of all of the words thrown at you by the e-Discovery Team blog? Just want to relax and enjoy the summer, but still keep up? Maybe learn something interesting and potentially useful? We understand. We have just the thing for you: a nostalgic look back at our robot movies. They are not extinct yet, and although some sequels stink, these are pretty good. Our robots cover transforming topics that are still cutting edge. They explain the use of storytelling and gamification in predictive coding. They also cover the ethics of viruses and bad robots, and then end with our robots getting ready to testify before Judge Waxse on random sampling in predictive coding. I dare say few people can follow their talk on sampling in just one viewing.

f28e9-blogtrojanhorselogoLove words like we do? Not satisfied with robot reruns? We understand that too. Our summer reading is mainly full of cool cybersecurity books found at eDiscovery Security, especially the Cyberthriller novels. Check them out. I’m reading Trojan Horse right now. I has to do with a virus that allows documents to be altered in route after they are sent by email. Talk about an evidence authentication nightmare!

Remember, for full enjoyment of these videos press the HD button on the upper right corner, and then expand in the lower right for full size screen. Maybe someday we will do 3D and iMax too!

eDiscovery Robots Explain How STORYTELLING Will Be Used in Predictive Coding in the Not Too Distant Future

______

eDiscovery Robots Explain How GAMIFICATION Will Be Used in Predictive Coding in the Not Too Distant Future

______

 

eDiscovery Robots Explain ETHICS and Predictive Coding in the Not Too Distant Future


_____

eDiscovery Robots Explain How RANDOM SAMPLING is Used in Predictive Coding

 

____________

_______

___

Goodbye Lexie! We luv ya! It was a great run while it lasted.

Goodbye Lexie! We luv ya!
It was a great run while it lasted.
Who knows? Maybe you’ll return someday too?

 


Hacking Flash Trading on Wall Street: From Fiction to Fact in Just Three Weeks

June 22, 2014

Rogue_Code_bk_cover

I read Mark Russinovich’s new novel, Rogue Code, a few weeks ago when it was first released. The book is about flash trading and criminal hackers attacking Wall Street hedge funds. Then, just this week, I read a news flash on CNBC of a real life hack attack of a Wall Street hedge fund. Cybersecurity firm says large hedge fund attacked (CNBC 1/19/14). Again, it involved the controversial practice of flash trading. The facts of the news report were eerily close to Russinovich’s fiction. The news report seemed to come right off the pages of Rogue Code. Unless this is an elaborate hoax to promote the book, Mark Russinovich has taken predictive coding to a new level.

Remarkable Parallels

In both the book and news report a sophisticated, highly organized team of skilled hackers penetrated what was thought to be a totally secure stock trading computer system. They then planted a very complex piece of software code, malware, that hid in the system. It operated undetected for months, taking a million here, a million there. The hidden program was remotely controlled to surreptitiously interfere with flash trading in order to direct profits to the hackers from intercepted trades. Millions of dollars were stolen over several months time.

In the novel and real world some suspicious circumstances caused the brokers to hire an outside cybersecurity firm to investigate their computer systems. The cybersecurity white hats finally discovered the malware. In the book the hero catches the bad guys. In real life no one seems to even have a clue as to who they are. They are at large, enjoying the rich life of the billionaires they stole from.

In the novel the penetration went beyond just one hedge fund into the very trading platform of the New York Stock Exchange. The whole world financial system was threatened. No one is saying if that has also happened in real life.

Paul Henninger_CNBCThe cybersecurity company that broke the story, BAE Systems Applied Intelligence, made a point of saying that this kind of hack into stock trading systems, especially high-speed flash systems, has never been seen before. It may not have been seen, but Mark Russinovich certainly imagined it. The BAE spokesman, Paul Henninger (shown right), says that this hack represents a new level of attack involving both very advanced computer technical skills and advanced trading skills. Henninger says there are only a few experts in the world with the necessary skills to pull it off. Yet, this was all described in detail in Mark Russinovich’s novel. Kind of makes you wonder where Mark gets his material?

Cyber Thrillers

zero_day

Russinovich is one of the best writers in the new fiction genre that I like, cyber thrillers. For a complete list of the most popular of these books that have a cybersecurity focus see my Must Read Books on Cybersecurity page, which is a part of eDiscoverySecurity.com. Rogue Code is Russinovich‘s third in a series that stated with Zero Day in 2010 and Trojan Horse in 2012. All three books in this series star Jeff Aiken, a cybersecurity expert who saves the world as a White Hat hacker. Jeff Aiken battles Black Hat bad guys and bureaucratic bumblers at the same time. Jeff Aiken is kind of a nerdy version of James Bond and serves as his own Q. He’s got some cool hacking tools that would even make the JΞSTΞR jealous.

I can really relate with Jeff Aiken’s constant frustration with small-minded government types that get in his way. They usually suspect him of the being the bad guy. The real bad guys, the black hatters, usually come across as more sympathetic characters, which is one of the charms of the Jeff Aiken series. But the real attraction of his novels for me is how much you learn about cyber security while reading them.

Mark Russinovich and the Texas Instrument 99/4A

TI99:4AI figured Russinovich books were good, and accurate, and provided real insights, just based on the background of the author himself. Mark Russinovich is the real deal. He is now a Technical Fellow in the Cloud and Enterprise Division at Microsoft. I personally like him because at age 15, he bought himself his first computer, a TI99/4A. That was also my first personal computer and the first one I wrote programs for.

My kids still fondly remember my Make a Face program of the 99/4A. My daughter claims that was the world’s first avatar creation program, although at the time, to be honest, I thought of it as a high-tech Mr. Potato Head. You could make thousand of different looking faces, and no matter what face you made, Mr. Computerhead was always happy with your design and said, with lips moving, I sure look good now! It was one of those games where you could not lose. I offered it for sale on the TI99/4A user group newsletter. I wonder if Mark was ever tempted to buy it? I say tempted, because I know for sure he did not buy it. Sadly, I never sold any, despite my one $25 ad, and so I concentrated instead on the life of a techno-trial lawyer and computer hobbyist.

mark_russinovichAnyway, Mark Russinovich went on to become a real computer expert while I plugged along as a lawyer. Mark earned a B.S. in computer engineering from Carnegie Mellon University, a leading university for elite white hats. Then he received an M.S. in computer engineering from Rensselaer Polytechnic Institute. Then after some work in the real world, he returned to Carnegie Mellon, for a Ph.D. in computer engineering in 1994. Yeah, Mark knows his stuff. In so far as Microsoft products are concerned, he is one of the top experts in the world. He has personally discovered, and we assume quickly disclosed and fixed, many software errors and vulnerabilities that hackers could otherwise have exploited for fun and profit. Indeed, Mark now has a suspiciously large body of knowledge on how to hack into business systems of all kinds, especially those based on Microsoft operating systems.

Is Truth Stranger Than Fiction?

I had no idea how good his knowledge really was, and how close he was to the pulse of the elite hacking world, until reading the news story this week. It seemed to come right off the pages of his new book. I fully expect Jeff Aiken to be on the case right now tracking down the rogue coders who penetrated the hedge fund. I wonder if they are in Brazil watching the World Cup? In fact, come to think of it, the events Mark was writing about in Rogue Code were, we now know, taking place on the real Wall Street at the very same time he was writing about it. Hmm. What a coincidence. I wonder if well-known SEC investigator and attorney, Robert Ashton, will look into that? Too bad Patrick Oot has moved on. I’m sure he could e-discover the truth, that is, unless the Brazilian Mafia, the NL, got to him first.

For more about the Rogue Code check out this video trailer. I think this book would make a great movie.

Of course, the facts in Rogue Code and the BAE Systems report are somewhat different. You would not want to be too obvious, would you? Still, to a careful reader of both stories, both fact and faction, the similarities dominate. Both involved teams of experts working together to interfere with hedge fund flash traders to directly profit from the trades. Both involved long-term penetrations that lasted for months and resulted in the diversions (a polite word for theft) of millions of dollars. That’s right. This is big time cyber fraud, involving Big Data and Big Money and victims who usually will not want to complain. It makes for the perfect crime, especially if you like stealing from billionaires in a way that will likely go undetected.

Will the True Story of Wall Street Hacking Ever Be Known?

The full story of the real attack on the Wall Street flash trading hedge fund is still unknown. Indeed, the odds of our ever knowing the full truth of the real attack are slim to none. The as yet unnamed hedge fund has every incentive to keep it secret and keep their name out of the press. Think how their customers would react if they knew their money had been stolen by hack attack? How would their customers, billionaires all, react if they found out that their brokers had been outsmarted by hackers. No. That would not work out too well. So, as we learn in Rogue Code the novel, these things are usually hushed up and the bad guys get away with millions.

Going back to real life, and the BAE report by Paul Henninger, who said:

It’s pretty amazing,” Henninger said in an interview Wednesday from London. “The level of business sophistication involved as opposed to technical sophistication involved was something we had not seen before.”  . . .

Henninger said such business-savvy financial attacks can represent “the perfect crime,” because they are extremely difficult to trace to obscure locations around the globe, and because companies can be reluctant to go to law enforcement. “It often takes a while for firms to get comfortable with the idea of exposing what is in effect their dirty laundry to a law enforcement investigation,” Henninger said. “You can imagine the impact potentially on investor confidence.”

He said he does not know if the hedge fund reported the details of the attack—which he estimated cost the firm millions of dollars over just a few months’ time—to the SEC or the FBI.

Officials from the SEC and FBI declined to comment on this specific case.  . . .

Henninger said the malware represented a multimillion dollar problem for the hedge fund. “This was not something that was a minor issue for them,” he said. “This was something that was getting reviewed at the board level of this hedge fund precisely because it was having a material impact on performance across the portfolio.”

Public disclosure of illicit trading based on hacked information is exceedingly rare.

Eamon Javers, Cybersecurity firm says large hedge fund attacked (CNBC 1/19/14).

Conclusion

Bodek_flashThe introduction to  Rogue Code was written by Haim Bodek, Managing Partner of Decimus Capital Markets, LLC. He is an expert on flash trading who is now sounding the alarm on the abuses that flash trading is causing on Wall Street. Even without cyber intrusions and theft by hackers, Bodek thinks the stock exchanges could fall by the dishonesty and inherent unfairness of flash trading. I do not know about that, but I do know this micro-second trading gives an unfair advantage to some. We need a level playing field and a stock market that provides equal opportunities to all, including small investors. I hope that the alarm sounded by Haim Bodek about flash trading is overstated, but fear it is not. Rogue Code, and now the report by BAE, suggest that his concerns are well founded.

I am not delusional enough to think that the alarm sounded by Mark Russinovich on hacking Wall Street is a false alarm. That is a separate issue. I have no doubt in my mind that this is a clear and present danger. Although Rogue Code is a work of fiction, the hacking of Wall Street is not. The SEC must start taking cybersecurity more seriously. Indeed, all of us need to do that. Hackers are now getting organized and profit driven. This is not just an Anonymous group of kids anymore, these are criminal gangs. Hack attacks should be reported to the FBI. The days of secretive cover-ups must come to an end.


The “If-Only” Vegas Blues – Part Two

June 15, 2014

This is part two of this blog. Suggest you read part one first for this to make much sense.

What Happened When Plaintiff Started Carrying Out Its Keyword Agreement

money down the toilet from dumb over review of ESII am sure you can guess what happened in Progressive after plaintiff’s counsel started trying to follow the keyword agreement. We have all seen this many times before. In Judge Leen’s own words, here is what happened:

Progressive and the FDIC–R agreed to search terms to be applied to ESI documents. Applying the search terms to the 1.8 million documents collected reduced potentially responsive documents culled by the search terms to approximately 565,000 documents. Progressive’s counsel began manually reviewing these 565,000 documents, but quickly determined that manual review was too time intensive and expensive. As a result, Progressive began exploring alternative approaches to the review and production of ESI.

In other words, Plaintiff’s counsel finally figured out that the keywords they had agreed only served to reduce the documents to be manually reviewed from 1,500,000 to 565,000 documents. I suspect this large number of hit came as a surprise to them. Had they only sampled and tested they would have known better than to agree to this protocol, so I assume they did not.

moneyAfter plaintiff’s counsel begun the manual review of the 565,00 documents they no doubt realized how long review takes. My guess is that the reviewers could only review about 75 files per hour. Thus it would take 7,533 attorney hours to complete the review. Assuming a fairly low contract review lawyer rate of only $40 per hour, that means a cost for manual review of over $300,000, which does not include the substantial additional law firm attorneys fees for supervision of this review. My guess is that the estimated costs to complete the manual review after keyword culling was between $350,000 to $400,000, possibly far more if the review speed was slower or rates higher. Every project is different for a variety of reasons.

Expert Arrives to Try to Save the Day

Stressed-Out-WomanIf I were a client hearing those costs for the first time, I would be upset. I would demand that my attorney do something. I suspect that is why the expert was finally brought in to this case, to help reduce the projected costs. The expert, of course, recommended predictive coding, no doubt as part of a multimodal search effort. I would assume that after the client and plaintiff’s counsel heard the expert, they were happy to learn that the use of predictive coding would greatly reduce the projected expenses. I am sure they were also pleased to learn this method would be more effective at finding relevant documents and privileged documents. There was only one little problem, they had already agreed to manual review of the documents with key words, and this agreement had been reduced to a court order.

Here is how Judge Leen describes what happened:

Progressive began utilizing predictive coding techniques to review ESI without the Defendants’ agreement to amend the parties’ stipulated ESI protocol Order (Dkt.# 67), and without seeking leave of the court to amend the ESI order.

Judge_Peggy_LeenJudge Peggy Leen says they began using predictive coding techniques, and makes a big point that they did so without the court’s permission. But if you did deeper into the opinion, it is apparent that plaintiff never actually got into full predictive coding training and review. They just began the process, and then went to the court for approval to carry it out.

Of course they had to begin the process in order to gather data as to how effective it might be and how it would work. How could they otherwise present their argument to change the protocol? In fact, all that plaintiff’s counsel did, no doubt under the supervision of their new expert, was to perform the initial random samples to determine prevalence and set up a control set of test documents. Somehow this point seems to have been lost in the opinion, which makes it seem like they did the entire predictive coding review before coming to the court. That appears to be a misunderstanding on the part of the court. In my standard protocol shown in the diagram below plaintiff had only done the first preliminary steps one, two, and three. They had only just begun. See LegalSearchScience.com and my Computer Assisted Review page.

predictive.coding

The plaintiff in Progressive, according to my reading of the technical facts stated in the opinion, had not even started the iterative machine training, which is the heart of all predictive coding. They were far away from the all important step seven of second pass final review. If they had they done the whole predictive coding process, all eight steps, and then come to the court for approval, I could understand why the request would be denied as too late. But, as it is, the denial seems contrary to the dictates of Rule One to always aim for just, speedy and inexpensive discovery. Rule 1, FRCP.

dilbert_boss_planYou could not come to a court to seek a new plan without providing concrete facts that the old plan was not working, and at least an outline of a new, better plan. I assume that is why the plaintiff’s expert in Progressive performed step two, and brought in a subject matter expert to review the samples. This allowed them to determine prevalence of the keyword culled collection. According to the court’s opinion, Plaintiff”s new experts found that only approximately 55,765 documents were likely relevant out of the total 565,000. That proves that the keyword search was a total disaster. It proves that almost 90% of the documents with keywords were irrelevant. That is a pathetic 10% precision rate.

That does not surprise me. Indeed, any search Dilbert could have predicted that. Ten percent precisions is bad, but I have seen worse. I once had a case much like this one, where I was brought in after the fact. I then found that almost 98% of the documents with keywords were irrelevant. This kind of evidence helped persuade the court to adopt a new plan. But that did not work here, and the only real clue we have are somewhat vague allegations in the Court order that plaintiff was not transparent enough, and did not cooperate. To me this smacks of a successful smoke and mirrors spin by counsel. That is a constant danger in highly technical areas like predictive coding unless you are very good at simplified, yet accurate explanations.

The expert showed to opposing counsel, then later the court, that the old plan was not working. He showed how imprecise the keyword search was, how expensive it would be to complete the search, and presented an alternative plan using predictive coding. Opposing counsel did not care about plaintiff’s expense, nor precision. They just said a deals a deal and refused any compromise. Then the issue of predictive coding went to the court, where, as we know, it was promptly shot down.

Alleged Defects of Proposed Predictive Coding Plan

Here is Judge Peggy Leen’s explanation as to why she refused to allow plaintiff to switch to predictive coding:

Had the parties worked with their e-discovery consultants and agreed at the onset of this case to a predictive coding based ESI protocol, the court would not hesitate to approve a transparent mutually agreed upon ESI protocol. However, this is not what happened.

Aside from the too late rationale, Judge Leen suggests that although she favors predictive coding, the details of the particular proposal made by plaintiff’s expert was somehow deficient. The opinion does not really spell out these deficiencies, at least not to my satisfaction. Judge Leen states:

Progressive’s proposal would relieve it of the burden of manual review of ESI according to the ESI protocol it stipulated to and allow it to adopt a predictive coding alternative to a small subset of the universe of ESI collected. Its proposal would also give its counsel exclusive responsibility for training the predictive coding software, and fails to comply with all of the best practices recommended by the Equivio software program. Progressive proposes a “do-over” of its own invention that lacks transparency and cooperation regarding the search methodologies applied.

gavelFailure to comply with all of a software provider’s best practices? Please! They make software, not legal methods, not best practices. They are not lawyers. They do not set legal standards. They are engineers and businessman. They make and sell software for profit. They cannot provide legal opinions on best practices. Whereas the expert retained by plaintiff in Progressive was a lawyer. In fact, he was one of the top experts in the field. What does an Israeli software company like Equivio have to do with legal standards in the U.S. justice system? Software is just a tool for legal search, much like a judge has a gavel as a tool. If a gavel came with recommendations on how to use it, would a judge’s call to order be invalid because she did it her own way? Her way might be far better than the default instructions.

So called “best practices” by non-lawyer vendors are nothing more than suggestions for beginners. No vendor would ever purport to provide legal advice, nor opine that its software can only effectively be used in one way. I doubt very much that Equivio does so. They are too sophisticated for that. They understand full well the importance of legal experts. In fact, best practice number one of  Equivio’s 2013 paper, Ten Essential Best Practices in Predictive Coding, is: “Choose the expert with due consideration.” The Equivio Best Practices paper wisely concludes by saying:

In conclusion, it should be noted that predictive coding is a dynamic, rapidly developing arena. The best practices described here will undoubtedly continue to evolve. In documenting them as they have taken shape over the past year or so, the intention is not to define a universal textbook template for the generic predictive coding project, but to provide a platform from which it will be possible to develop, refine and create new and better practices as the e-discovery industry continues to assimilate the game-changing technology of predictive coding.

Warwick_SharpEquivio’s article on Best Practices was written by Warwick Sharp, the Vice President of Marketing and Business Development for this software corporation based in Rosh Haayin, Israel. Mr. Sharp has a master’s degree in Political Thought from Tel-Aviv University. He has it right in stating that his company builds a platform, and others, namely legal and scientific experts, develop, refine and create new and better practices to use this game-changing technology. If details had been provided in Judge Leen’s opinion as to the alleged failures to comply with all of Equivio’s best practices, then perhaps we could understand what this comment was supposed to mean, but there were none. Again I suspect clever spin doctor arguments at work here by opposing counsel.

disclosureAs to Judge Leen’s “lack of transparency” comment, we are again left guessing at what this means. It seemed like a pretty detailed disclosure to me. I am just guessing here, but usually the big issue in predictive coding is whether to disclose irrelevant training documents. My thinking on this has evolved since the Da Silva Moore case protocol where all irrelevant training documents were to be disclosed. Based on my own research, and that of others, only grey area irrelevant documents need to be disclosed for the process to work. These borderline documents are the only ones that make a difference in the training, as I have discussed in detail before. See Eg: Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Part Three at the subheadings Disclosure of Irrelevant Training Documents and Conclusions Regarding Inconsistent Reviews.

grey_area_disclosure

Probably the experts disagreed on this point. There is still not widespread consensus on the issue of disclosure of irrelevant training documents. No doubt defense counsel wanted to look at everything, even though the science does not support that. Maybe that was the key issue, maybe not. The opinion does not really say. If that was the issue, then I would have preferred the court to rule on it and allow predictive coding to be used according to what the judge thought was a reasonable disclosure. Give the proponent the choice to go forward on that basis or not.

keyword_predictive-FiltersThe only specific objection made by defense counsel that I could see had to do with the objection that the predictive coding was based on a keyword filtered subset of 565,000 documents, and not the full collection of 1,800,000. This objection by the defense was noted in Judge Leen’s opinion, but it was not analyzed, nor included as a specific basis for the denial. Judge Leen just used the dispute on this issue as an example of plaintiff’s alleged failure to cooperate. This two-filter culling approach is controversial primarily because the first filter, keyword, usually does such a poor job of recall. It filters out too much. The only real justification for this approach is cost. Most experts agree a filter of some kind should be used for the pool of documents to be subject to predictive coding, but it should usually not be a simplistic keyword filter as proposed here. For instance, non-text documents should be filtered out since predictive coding now uses text analytics. Other types of documents should be filtered out too for best results.

Again, I would have preferred a ruling on this two-filter issue, perhaps after hearing evidence, or at least a detailed argument with proffers. Then the proponent should be given the opportunity to go forward on the basis of the ruling or not. But that did not happen, instead we get an order like this:

Progressive is unwilling to engage in the type of cooperation and transparency that its own e-discovery consultant has so comprehensibly and persuasively explained is needed for a predictive coding protocol to be accepted by the court or opposing counsel as a reasonable method to search for and produce responsive ESI. Progressive is also unwilling to apply the predictive coding method it selected to the universe of ESI collected. The method described does not comply with all of Equivio’s recommended best practices. The court agrees with the FDIC–R that approving Progressive’s predictive coding proposal, or for that matter, the FDIC-r’ competing predictive coding protocol, will only result in more disputes.  It will also further delay completion of discovery in this 2011 filed case.

Court’s Punitive Order in Progressive

The court ordered the plaintiff to produce all of the 565,000 “hit” documents within fourteen days without further review. The court justifies this seemingly harsh result with the following palliations:

Under these circumstances, the court will require Progressive to produce the “hit” documents to the FDIC–R within fourteen days without further review. The court recognizes that requiring production of all of the “hit” documents will likely result in the production of documents not responsive to the FDIC–R’s discovery requests. However, the parties’ stipulated ESI protocol adopted this approach as one of two alternatives for Progressive’s production. Progressive elected and then abandoned the second option-to manually review and produce responsive ESI documents. It abandoned the option it selected unilaterally, without the FDICR’s acquiescence or the court’s approval and modification of the parties’ stipulated ESI protocol. Adopting the FDIC’s proposal of producing the “hit” documents will shift the cost of review to the FDIC–R. The FDIC–R has committed to devote the resources required to review the documents as expeditiously as possible and estimates the process could be completed in about a month by tapping into the resources of the numerous firms involved in these related actions who also have a substantially similar ESI protocol entered in their actions. It will allow discovery, which has been stalled for many months while this dispute is pending, to move forward, and reduce future disputes about Progressive’s ESI production.

But what about the right of plaintiff to protect its privileged communications in the forced production of 565,000 documents, you might well ask? Judge Leen seems mindful as to the apparent harshness of her order, and offers this as a consolation.

Progressive may apply privilege filters to the 565,000 documents retrieved by use of the search terms and withhold more likely privileged documents identified by use of privilege filters. Progressive shall serve a privileged document log which fully complies with Rule 26(b)(5)(A) for any documents withheld or redacted as privileged or subject to protection as trial-preparation material. To reduce the burden of preparation of a privileged document log, the court will permit Progressive to serve three privileged documents logs for one-third of the documents withheld or redacted at thirty day, sixty day, and ninety day intervals.

No word yet on whether plaintiff’s have appealed this order. I suspect they will. But in the meantime you can bet plaintiff’s counsel are working feverishly on privilege filters to try to protect their client’s privileged communications. Clawback orders can never un-ring a bell.

Conclusion

Losey_Oval_2013_ABAProgressive shows the dangers of not involving expert search counsel at the very beginning of a case. This may seem a little self-serving on my part, but the message in Progressive is loud and clear – hire a predictive coding search consultant as soon as you realize you have a big data case. If you are going to have to review several hundred thousand documents, then it is probably big enough to use predictive coding. Bring in a legal search expert. If there are more than a million documents, as in Progressive, then you obviously should use AI-enhanced review. Progressive shows that it is ill-advised to wait to hire a legal search consultant. Do not be penny wise and pound foolish, bring one of us in right away. If you wait too long, it may be too late for us to help you.

Progressive also shows, once again, that keyword search alone does not work to find targeted documents. It is not a magic incantation. In Progressive only one out of ten of the documents with keywords was at all relevant. That is very imprecise. It creates a huge waste of reviewer time doing second review. Every predictive coding project I have ever worked on has had far better precision than that. My experience is not unique. Other tests show the recall is also better when you include active machine learning. It is better than using keyword and manual review alone. I might also add that similarity and concept type searches are also good. Use all of the tools provided by modern search and review software.

Finally, Progressive shows that, timeliness issues aside, judges understand the efficacy of predictive coding. Judge Leen said she believed that predictive coding was better than the keyword protocol adopted. She said she would have approved it, if only it had been brought to her earlier.

Do not make the same mistake as plaintiff’s counsel in Progressive, hire a predictive coding expert sooner rather than later and so avoid the “if only” Vegas blues.


Follow

Get every new post delivered to your Inbox.

Join 3,312 other followers