e-Discovery Industry Reaction to Microsoft’s Offer to Purchase Equivio for $200 Million – Part Two

October 19, 2014

microsoft_acquiresThis is part two of an article providing an e-discovery industry insiders view of the possible purchase of Equivio by Microsoft. Please read Part One first. So far the acquisition by Microsoft is still just a rumor, but do not be surprised if it is officially announced soon.

Another e-discovery insider has agreed to go public with his comments, and three more anonymous submissions were received. Let’s begin with these quotes, and then I will move onto some analysis and opinions on this deal and the likely impact on our industry.

More Industry Insider Comments

John_Kerry

Jon Kerry-Tyerman (VP, Business Development, Everlaw): “If you think about this potential acquisition in the context of the EDRM, it makes a lot of sense. The technological issues on the left-hand side—from Information Governance through Preservation and Collection—are primarily search-related, rather than discovery-related.  And the technology behind search is largely a problem that’s been solved. That’s why we see these tasks being commoditized by the providers of the systems on which these data reside, entrenched players like Microsoft and Google. Microsoft has already shown a willingness to wade deeper here (see, e.g., Matter Center for Office 365), so the acquisition of Equivio’s expertise to improve document search and organization within the enterprise is a logical extension of that strategy.

I don’t think, however, that this heralds an expansion by Microsoft into the wider “ediscovery” space. The tasks on the right-hand side of the EDRM—particularly Review through Presentation—depend on expert legal judgment. While technology cannot supplant that judgment, it can be used to augment it. Doing so effectively, however, requires a nuanced understanding of the unique legal and technological problems underlying these tasks, and the resulting solutions are not easily applicable to other domains. For a big fish like Microsoft, that’s simply too small a pond in which to swim. It happens to be the perfect environment for a technology startup, however, which is why we’re focusing exclusively on applying cutting-edge computer science to the right-hand side of the EDRM—including our proprietary (read: non-Equivio!) predictive coding system.”
_________________

anonymousAnonymous One (a tech commentator not in e-discovery world provides an interesting outsider view): “I read the commentary and found it to be fairly eDiscovery introspective.  What I think is:

  1. I don’t know the Equivio markets as well as I should. I thought Equivio was/is a classification engine that did a wonderful job of deduplication of email threads. They played in the eDiscovery markets and we don’t focus on these markets except for their relevance to information governance.
  2. Equivio lacked a coherent strategy to integrate to the Microsoft stack, at the level of managed metadata, content types, and site provisioning, which doomed them to bit player status unless someone acquired them or they committed to tight integration with the hybrid SharePoint/Office 365/Exchange/OneDrive/Yammer/Delve/File Share stack for unified content governance. Now someone has. Hats off to Warwick & Co. for $200MM for this.
  3. My expectation is that Equivio will be added into Office 365 and Delve to crawl through everything you own and classify it, launching whatever processes you want. This is not good news for Content Analyst, dataglobal, Nuix, HP Autonomy, or Google, except that Google and HP are able to stand on their own. It is also not good news, but less bad news for Concept Searching, Smart Logic and BA Insight, in that they leverage SharePoint and Office 365 Search and extend it with integration points and connectors to other systems.
  4. Microsoft is launching Matter Center at LegalTech in NYC in February after announcing it at ILTA. This is the first of the vertical solutions that begin the long journey of companies to adopt either the Microsoft or Google cloud solution stacks and abandon the isolated silos of information like Box, Dropbox, etc., for the corporate side of information management.”

_____________

Anonymous Two: “It’s an interesting move for Microsoft. $200M is a little high for tools in our industry, but is peanuts for them. They make dozens of these types of moves and spend billions each year acquiring various companies and technologies. I agree with Craig Ball regarding how many times have we seen formidable competitors go the way of the Dodo after they were purchased by a bigger company. I highly doubt they are planning to jump into our industry to lock horns with all of us. It is more likely that they may be developing some sort of Information Governance & Analysis offering for businesses, which could have some downstream effects on eDiscovery.”

_____________

Anonymous Three: “The acquisition of Equivio by Microsoft and the price paid are not a complete surprise. I agree with others who do not see this as a sign of Microsoft entering the ediscovery business. If Microsoft wanted  to do that it could acquire any of the big ediscovery players out there. Rather, the Equivo acquisition allows Microsoft to offer a service that other big data companies cannot. Putting aside HP’s acquisition of Autonomy, I think Microsoft’s acquisition of Equivo is only the first of what will be a series of technology acquisitions by big data companies. These companies, that handle terabytes upon terabytes of data for major corporations around the world, can one day provide ediscovery as an additional service offering. That day isn’t today, but it is coming.”

What Microsoft Will Do With Equivio

DiogenesThe consensus view is that after the purchase Microsoft will essentially disband Equivio and absorb its technology, its software designs, and some of its experts. Then, as Craig Ball predicts, they will wander the halls of Redmond like the great cynic Diogenes. No one seems to think that Microsoft will continue Equivio’s business. For that reason it would make no sense for Microsoft to continue to license the Equivio search technologies to e-discovery companies. That in turn means a large part of the e-discovery industry that now depends on Equivio search components, and licenses with Equivio, will soon be out of luck. Zoom will go boom! More on that later.

If  Microsoft did not buy Equivio to continue its business, why did it want its technology? As the scientists I talked to all told me, Microsoft already has plenty of artificial intelligence based text search capabilities, software, and patents. But maybe they are not designed for searching through large disorganized corporate datasets, such as email? Maybe their software in this area is not nearly as good as Equivio’s. As smart and talented as my scientist friends seem to think Microsoft is, the company seems to have a black hole of incompetence when its comes to email search and other aspects of information management.

The consensus view is that Microsoft wants Equivio to grab its technology and patents (at least one commentator also thought they were also after Equivio’s customers). The Microsoft plan is probably to incorporate its software code into various existing Microsoft products and new products under development. Almost no one expects those new products to be e-discovery specific. They might, however, help provide a discovery search overlay to existing software. Outlook, for instance, has pathetic search capacities that frustrate millions daily. Maybe they will add better e-discovery aspects to that. I personally expect (hope) they will do that.

Information Governance Is Now King

emperor's new clothes woodcutI also agree with the consensus view in our industry, a view that is now preoccupied with Information Governance, that Microsoft’s new products using Equivio technology will be information governance products. I expect Microsoft to once again follow IBM and focus on the left side of EDRM. I expect Microsoft to come out with new Governance type products and software module add-ons. I do not think that Microsoft will go into litigation support specific products, such as document review software, nor litigation search oriented products. Like IBM, they think it is still too small a market and too specialized a market.

Bottom line, Microsoft is not interested in entering the e-discovery specific market at this time, any more than IBM is. Instead, like most (but not all, especially Google) of the smart creatives of the technology world, Microsoft has bought into the belief that information is something that can be governed, can be managed. They think that Information Governance is like paper records management, just with more zeros after the number of records involved. The file-everything librarian mentality lives on, or tries to.

The Inherent Impossibility, in the Long Run, of Information Governance

Most of the e-discovery world now believes that Information Governance is not only possible, but it is the savior to the information deluge that floods us all. I disagree, especially in the long run. I appear to be a lone dissenting voice on this in e-discovery. I think the establishment majority in our industry is deluding themselves into thinking that information is like paper, only there is more of it. They delude themselves into thinking that Information is capable of being governed, just like so many little paper soldiers in an army. I say the Emperor has no clothes. That information cannot be governed.

paper doll cutouts

Electronic Information is a totally new kind of force, something Mankind has never seen before. Digital Information is a Genie out of the bottle. It cannot be captured. It cannot be managed. It certainly cannot be governed. It cannot even be killed. Forget about trying to put it back in the bottle. It is breeding faster than even Star Trek’s Tribbles could imagine. Like Baron and Paul discussed in their important 2007 law review, ESI is like a new Universe, and we are living just moments after the Big Bang. George L. Paul and Jason R. Baron, Information Inflation: Can the Legal System Adapt? 13 RICH. J.L. & TECH. 10 (2007).

bigbang

 

Ludwig-WittgensteinWhat few outside of Google, Baron, and Paul seem to grasp is that Information has a life of its own. Id. at FN 30 (quoting Ludwig Wittgenstein (a 20th Century Austrian philosopher whom I was forced to study while in college in Vienna): “[T]o imagine a language is to imagine a form of life.”) Electronic information is a new and unique life form that defies all attempts of limitation, much less governance. As James Gleick observed in his book on information science, everything is a form of information. The Universe itself is a giant computer and we are all self-evolving algorithms. Gleick, The Information: a history, a theory, a flood.

Essentially information is free, and wants to be free. It does not want to be governed, or charged for. Information is more useful when free and when it is not subject to transitory restraints.

info_free

Regardless of the economic aspects, and whether information really wants to be free or not, as a practical matter Information cannot be governed, even if some of it can be commoditized. Information is moving and growing far too fast for governance.

Digitized information is like a nuclear reaction that has passed the point of no return. The chain reaction has been triggered. This is what exponential growth really means. In time such fission vision will be obvious. Even people without Google glasses will be able to see it.

Nuclear chain reaction

In the meantime we have a new breed of information governance experts running around who serve like heroic bomb squads. Some know that it is just a noble quest, doomed to failure. Most do not. They helicopter into corporate worlds attempting to defuse ticking information bombs. They build walls around it. They confidently set policies and promulgate rules. They talk sternly about enforcement of rules. They automate filing. They automate deletion. Some are even starting to make robot file clerks.

Information governance experts, just like the records managers before them, are all working diligently to try to solve today’s problems of information management. But, all the while, ever new problems encroach upon their walls. They cannot keep up with this growth, the new forms of information. The next generation of exponential growth builds faster than anyone can possibly govern. Do they not know that the bomb has already exploded? The tipping point has already past?

Information governance policies that are being created today are like sand castles built at low tide. Can you hear the next wave of data generated by the Internet of Things? It will surely wash away all of today’s efforts. There will always be more data, more unexpected new forms of information. Governance of information is a dream, a Don Quixote quest.

Information can not be governed. It can only be searched.

search_globalIn my view we should focus on search technologies, and give up on governance. Or at least realize it is a mere stop-gap measure. In the world I see, search is king, not governance. Do not waste your valuable time and effort trying to file information. Just search for it, when and if you need it. You will not need most of it anyway.

I do not really think Microsoft has the fission vision, but I could be wrong. They may well see the world like I do, and like Google does, and realize that it is all search now. Microsoft may already understand that information governance is just a subset of search, not visa versa. Maybe Microsoft is already focused on creating great new search software that will help us transition from governance to search. Maybe they hope to remain relevant in the future and to compete with Google. No one knows for sure the full thinking behind Microsoft’s decision to buy Equivio.

The majority of experts are probably right, Microsoft probably does have information governance software in mind when buying Equivio. Microsoft probable still hangs onto the governance world view, and does not see it my way, or Google’s way, that it is all about search. Still, by buying good search code from Equivio, Microsoft cannot go wrong. Eventually, after the governance approach fails, which I predict will happen in ten years, or less, and Microsoft and the governance experts finally see the world like Google and me, it will help to have Equivio’s code as a foundation.

What Happens If Zoom Goes Boom?

In the short-term what companies may be adversely affected by the exit of Equivio from the e-discovery market?  For the obvious answer, you have only to check out Equivio’s installed base web page. The most prominent Equivio users, as everyone in the e-discovery industry already knows, is K-Cura and its Relativity review platform. In fact, Relativity is the “featured user,” among all others, of Equivio’s text analytics. Equivio even includes a page on its website that promotes the Equivio Zoom tab on Relativity’s software.

So what happens if Zoom goes boom? Will naive law firm and corporate consumers not notice? Most probably can be hyped. Hey, after all, predictive coding software is all alike, right? Well no, it is not. There are huge differences in quality. There are huge differences in how well one company’s predictive coding features work as compared to another. That is exactly why Equivio existed, to license technologies to fill the gap. Equivio did the research and code development on search so that most other vendors in the industry would not have to. These vendors could license Equivo’s work fairly inexpensively, and use the money saved to create other features, other software, like litigation hold notice management.

Only a few vendors have taken the time, and very considerable expense, to develop their own active machine learning software features, instead of licensing it from Equivio. These vendors will now reap the rewards of having the rugs pulled out from under most of their competitors. Eventually even lawyers will realize that search quality does matter, that all predictive coding software programs are not alike.

K-Cura is not the only one who should be worrying here. There is a long list of other key users of Equivio products. They include, according to the list on Equivio’s own website:

  • Concordance by Lexis Nexis
  • DT Search
  • EDT
  • Xera iConnect
  • iPro
  • Law PreDiscovery by Lexis Nexis
  • Thomson Reuters

In addition, Equivio’s installed base web page lists the following companies and law firms as users of their technology. It is a very long list, including many prominent vendors in the space, and many small frys that I have never heard of. They should all be concerned, to one degree or another, according to how dependent they are on Equivio software or software components.

  • Altep
  • BDO Consulting
  • Bowman & Brooke
  • CACI
  • Catalyst
  • CDCI research
  • Commonwealth Legal
  • Crown Castle
  • D4
  • Deloitte
  • Dinsmore
  • Discover Ready
  • Discovia
  • doeLegal
  • DTI
  • e-Stet
  • e-Law
  • Envision Discovery
  • Epiq Systems
  • Ernst & Young
  • eTera Consulting
  • Evidence Exchange
  • Foley & Lardner
  • FTI Consulting
  • Gibson Dunn
  • Guidance Software
  • H&A
  • Huron
  • ILS Innovative Litigation Services
  • Inventus
  • IRIS
  • KPMG
  • USIS Labat
  • Law In Order
  • LDiscovery
  • Lightspeed
  • Lighthouse eDiscovery
  • Logic Force Consulting
  • Millnet
  • Morgan Lewis
  • Navigant
  • Night Owl Discovery
  • Nulegal
  • Nvidia
  • ProSearch Strategies
  • PWC
  • Qualcomm
  • Reed Smith
  • Renew Data
  • Responsive Data Solutions
  • Ricoh
  • RVM
  • Shepherd Data Services
  • Squire Sanders
  • Stroock
  • TechLaw Solutions
  • Winston & Strawn

This is Equivio’s list, and it may not be current, nor even accurate (some of the links were broken), but it is what is shown on Equivo’s website as of October 14, 2014. Do not blame me if Equivio has you on the list, and you should not be, but feel free to leave a comment below to set the record straight. Hopefully, many of you have already moved on, and no longer use Equivio anymore anyway. I happen to know that is true for a few of the companies on that list. If not, if you still rely on Equivio, well, maybe Microsoft will still do business with you when it is time to renew, but most think that is very unlikely.

Conclusion

Ralph_bemuzedIt is hubris to think that a force as mysterious and exponential as Information can be governed. Yet it appears that is why Microsoft wants to buy Equivio. Like most of establishment IT, including the vast majority of pundits in our own e-discovery world, Microsoft thinks that Information Governance is the next big thing. They think that predictive coding was just a passing fad that is now over. If these assumptions are correct, then we can expect to see fragments of Equivio’s code appear in Microsoft’s future software as part of general information governance functions. We will not see Microsoft come out with predictive coding software for e-discovery.

Once again, Microsoft is missing the big picture here. Like most IT experts today outside of Google, they do not understand that Search is king, and governance is just a jester. The last big thing, Search, especially AI enhanced active machine learning, iw – predictive coding, is still the next big thing. Information governance is just a reactive, Don Quixote thing. Not big at all, and certainly not long-lasting. If anything, it is the dying gasp of last century’s records managers and librarians. Nice people all, I’m sure, but then so was John Henry.

Microsoft’s absorption of Equivo is a setback for search, for legal e-discovery. But at the same time it is a boon for the few e-discovery vendors who chose not to rely on Equivio, and chose instead to build their own search. It is also a boon for Google, as, once again, Microsoft shows that it still does not get search. You will not see Google fall for a governance dream.

Search is and will remain the dominant problem of our age for generations. Information cannot be governed. It cannot be catalogued. It can only be searched. Everyone needs to get over the whole archaic notion of governance. King George died long ago.

Google has it right. We should focus our AI development on search, not governance. Spend your time learning to search, forget about filing. It is a hopeless waste of time. It is just like the little Dutch boy putting his finger in the dyke. Learn to swim instead. Better yet, build a search boat like Noah and leave the governor behind.


e-Discovery Industry Reaction to Microsoft’s Offer to Purchase Equivio for $200 Million – Part One

October 12, 2014

microsoft_acquiresOn Oct. 7, 2014, the Wall Street Journal reported that Microsoft had signed a letter of intent to buy what they called an Israel-based text analysis startup company named Equivio. The mainstream business press has virtually no understanding of the e-discovery industry, nor anything having to do with litigation support. They also seem to have no real grasp of what kind of software Equivio and others like it in the industry have created. They have probably never even heard of predictive coding! The business press for all of these reasons, and more, have no idea why Microsoft would pay $200 Million to buy Equivio. But, as this blog will show, we in the e-discovery community have plenty of ideas about that, and plenty to say about the whole deal.

Unlike the general business press, including the prestigious Wall Street Journal, everyone in the e-discovery industry knows, or at least know of, Amir Milo, Yiftach Ravid and Warwick Sharp. We all know their company, Equivio. Even though the Wall Street Journal calls Equivio a start up company, we all know that is not true. Equivio’s IPO was in 1998. Milo, Ravid and Sharp have been part of the e-discovery world from the very beginning. I must admit, however, that no one seems to know, not even Milo himself, why Equivio’s website always shows them hiding behind funny paper documents. But at least now we know why they are smiling.

Equivio_management

The WSJ reported that the deal could still fall through, and neither side would comment. Of course, as all lawyers know, deals can always fall though, that is not news. But, in my experience, once the letter of intent stage is reached, and it is leaked, it is pretty much a done deal, barring only unforeseen due diligence problems. So, assuming the deal does go through, what does this mean to the e-discovery industry?

I asked a few of the leaders in the e-discovery world their reaction to the Microsoft Equivio deal. Most of them responded, some on the record, some off, and some both on and off. Here is my report, on what every one seems to agree is very big news indeed, at least for our industry.

Business Press View of the Microsoft Equivio Deal

Before I share the industry insights, it is interesting to see how the general business world views the deal. First, as the WSJ article that broke the news exemplifies, they do not even seem to know that a e-discovery industry exists, nor that Equivio is part of it. Instead, the WSJ just describes Equivio as a startup company that has created:

text analysis software that can group together relevant texts from large amounts of documents—including emails and other organizational social and collaboration networks—using machine learning algorithms. The algorithms generalize samples of texts marked as relevant to the issue at hand to apply the sorting logic to groups of texts, such as legal documentation.

The article states that the technology is already in use by organizations that provide litigation support services to law firms and corporate legal departments. Well, at least the business world seems to knows that there is some sort of litigation support industry. The job our industry supposedly performs is also misunderstood. It is over simplified to the point of absurdity. The WSJ level of comprehension in this area is exceedingly low. They think the job of litigation support is to, as they put it, try to extract relevant data, such as legal contracts, from massive amounts of documents.

amir_marloSo apparently the tens of thousands of us in the e-discovery world spend our professional lives trying to find legal contracts. My, what idiots we must all be! And by implication, what idiots Microsoft must be to spend $200 Million for a company that has developed software with machine learning to find contracts. Wrong! There is much more to predictive coding than meets the eye of the average business journalist, most of whom have never even heard of the term, much less of Mr. Milo. Whatever the final price may be, the Two Hundred Million number sounds about right to me. No one else I talked to seemed shocked by the price either, which was certainly not true about Hewlett Packard’s ill-fated purchase of Autonomy for $10 Billion. Although a few friends I talked to did say that the next time they have dinner with Milo they are going to let him pick up the check.

 Industry Insider On-The-Record Comments

I will start off by reporting the comments where I have been given permission to quote with attribution. Then I will share a few comments where I was provided permission to quote, but not provide attribution. Some insiders also provided interesting background type information and speculation, which I do not have permission to quote or cite in any way. These comments inform my own opinions, which, you can rest assured, will also appear in this blog, but in Part Two, along with any straggler comments I may receive.

But first the attributable quotes, with thanks to the many who quickly responded to my vague questions, and agreed to go on the record about this very interesting, yet, as of today, still only rumored deal. As you can see my favorite industry insiders have a lot to say about this deal. Some of it is kind of corporate approved general writing, but there are also some controversial and strong opinions in here. Plus you will find some deep thoughts about our industry in general, not just this one deal.

Jason R. Baron (Of Counsel, Drinker, Biddle & Reath LLPwho broke the news to me of this story and so gets the lead): “I consider the deal to be a good thing for the legal tech sector.  As we see bigger players with more market power recognizing the value that firms like Equivio contribute, we can hope that the bigger firms will leverage their greater influence to accelerate adoption of good IG practices.”

_________________

Ball_Baron_Losey

Craig, Ralph & Jason at some sort of e-disco event

Craig Ball (ESI Special Master and Attorney,  Computer Forensic Examiner, Author and Educator): “I see two options for an acquired Equivio: Either it empowers Equivio to grow in the legal marketplace, or (and the smart money’s here), it spells the disappearance of Equivio from the legal marketplace. If Equivio’s technology isn’t destined to wander the halls in Redmond like Diogenes carrying a Zune, it will be dedicated to internal use or baked into offerings not geared to e-discovery.

What I do not think the acquisition signals is a desire by Microsoft to compete in the fledgling e-discovery marketplace. Microsoft isn’t buying Equivio for its nascent presence in e-discovery. It wants Equivio’s technology–maybe for 365, maybe for Bing, maybe for a product yet to be named. Sometimes, big companies buy technology to stick it in a drawer. Look at how many great products went to Lexis-Nexis to die. One thing is fairly certain, you can bid Equivio adieu from litigation support.  …

As to the price paid for Equivio, it’s a windfall for Equivio; but for Microsoft, the price is as impactful as it would be for you or I to purchase a good steak.”
_______________
John_GJohn Grancarich (Vice President, Product Management, Kroll Ontrack): “Microsoft is continuing to expand its data analysis offerings to provide solutions for larger data management problems. The ediscovery marketplace has some compelling and cutting-edge technologies that can move Microsoft closer toward achieving those broader goals. A few years back, Microsoft representatives started attending EDRM conferences – and other such think-tank meetings – which certainly helped them attain a deeper level of understanding about the robust capabilities across the entire technology provider landscape. When you think about combining the power of Microsoft with the software and service capabilities of key players in the ediscovery industry, the future ahead is exciting.”
_______________
JohnTredennickJohn Tredennick (Founder and Chief Executive Officer of Catalyst): “I was surprised at the news but extend my congratulations to Amir and the Equivio team for their successful outcome.
I have no inside information on this deal but would suspect that Equivio’s move into the Information Governance space might have been as attractive to Microsoft as their e-discovery background. IG systems can involve billions of documents (rather than the millions in e-discovery) so the problems are of a different magnitude. Systems that can make sense of such volumes will be at a premium in this new era of information management.”
_______________

ken_withersKenneth J. Withers (Deputy Executive Director, The Sedona Conference): “First, all we know is that there is a letter of intent signed between Microsoft and Equivio, in which Microsoft states its intention to acquire Equivio for $200 million. We don’t have any details on exactly why MSFT is doing this, or what it plans to do with Equivio’s products, current client base, IP, or staff. Based on a quick look at Equivio’s web site, it doesn’t’ look like they have any high-ranking female executives, so I think we can rule out “executive and staff diversity” as a goal in this acquisition. Beyond that, it may be a mistake for readers of eDiscovery team to think that entry into the legal marketplace is a goal, either. The eDiscovery market is – or soon will be – dwarfed by the larger Information Governance (IG) market, and that is an area in which MSFT really needs to step up its game, especially in relation to SharePoint. For several years, information professionals have viewed SharePoint as a proverbial poisoned apple, symbolizing the potential for both great knowledge and great sin. You can’t keep people from biting at the apple, you can only manage the consequences of immediate expulsion from IG Eden into a wilderness of terabytes of fractured data. I’m sure it has not been lost on MSFT executives that there are at least a dozen third-party IG solution providers purporting to tame data generated by Microsoft Office applications and stored in SharePoint. There are many facets of IG that are amenable to smart automated solutions. As volumes of data continue to grow exponentially, an advanced data analytics component to an overall IG suite of applications is absolutely essential. Readers of this blog may think first in terms of eDiscovery, but for many companies providing data analytics solutions, eDiscovery has become the market testing ground for the much more lucrative IG market. If you can make it work for the litigators, there is hope for the rest of the world. And not only would you have a good IG tool, but you would also still have a solid eDiscovery tool that could be built into a client’s (or law firm’s) Microsoft deployment. So this acquisition might also be another step in the mainstreaming of eDiscovery – taking some of the most costly and least lawyerly tasks out of the hands of the big law firms and third-party legal service providers, and enabling small businesses, small law firm, and even individuals to cost-effectively engage in eDiscovery. I mentioned, I don’t think that is MSFT’s primary motivation. But if MSFT offers a serious eDiscovery tool to the masses, I think that many of the legal service providers (and large law firms with eDiscovery search and processing divisions) will need to reexamine their business models.

Largely through its eDiscovery offerings, Equivio has built a solid reputation and a respectable client base, so I am not surprised that MSFT would look to it as a potential partner (or meal, depending on your point of view) to add advanced data analytics to an emerging suite of applications for the IG market. And for MSFT, the $200 million offer is equivalent to a rounding error in their overseas tax liability, so it’s a bargain, too. And I don’t equate this with HP’s acquisition of Autonomy at all.

All this is speculation, of course. I’m not qualified to predict what MSFT will do, but by acquiring Equivio, MSFT is positioning itself to compete with IBM (which has Watson and lots of other R&D in the works) and Google (which is, after all, Google) in large-scale data management through analytics.”

_______________

Bill Hamilton (Partner, Quarles & Brady): “I see it as a seismic shift.”

_____________

J. William “Bill” Speros (Attorney Consulting in Litigation Management): “Practical. Rational. Responsible. Therefore, not natural for a big company.”

_____________

Bruce_blankBruce I. Blank (Director, Litigation Services & Support, Foley & Lardner LLP): “With an organization like Microsoft and the endless list of research they are conducting you can only guess at why they would purchase Equivo and that is exactly what I am going to do, guess. Microsoft has been inching closer and closer to the discovery world for several years now.  With the role out of Exchange 2013 and the in-place discovery search tool being integrated into the discovery management system it is clear there is a defined focus on collection for hold and discovery purposes. There was an  announcement today that with Office 365 and OneDrive Microsoft is going to separate attachments from emails. The first reactions is it sends a chill down ones back, at least those that had to deal with linked attachments to emails in the past but maybe Microsoft has solved that problem with key pointers that will preserve family relations. Where does Equivo fit in? Microsoft is using Keyword Query Language, (KQL) which they claim will easily construct powerful search queries to search content indexes for both on-premises and on-line however this is just keyword searches, which is very limiting in many ways. But what if you incorporate Equivo’s analytics in conjunction with the KQL. Keyword searching now with predictive coding analytics raises the bar of credibility. Maybe this, now common workflow, is necessary to make this palatable to the corporate market.

I am not really sure at the end of the day what gets accomplished using Microsoft e-discovery tools, particularly if we are dealing with a large organization with many moving parts. Microsoft clearly says we can search it if it is in our Ecco system but if not, you need another way to search for discovery. In other words, if you are using OneDrive, SharePoint, Exchange, or Office products, for example, then they will be able to search it. If you are using any Apple or Google tools (or many others) on your computer they might not get searched, just tagged as un-searched or un-readable. Thus potentially throwing any search term reports off and possibly even requiring a third party vendor to finish the job to put the pieces back together again. Relaying on an inexperienced attorney or IT member crafting the search strategy coupled with the limits of Microsoft e-discovery tools could be a recipe for disaster.

Well as I said, all of this is only a guess. There is much to learn yet about what they are up to but on the face, I wonder what the discovery attorneys I work with would have to say about clients IT getting even more involved in the discovery process.  The discovery industry has grown in sophistication not only by the technology tools used but by the flesh the attorneys have had to pay for the inconsistencies and inaccuracies of tools given to our IT friends.  There are some good things that Microsoft is doing in discovery but I would certainly recommend solid experienced guidance with your discovery projects. I started this with just a guess but if my guess is close then one must ask not only who will be doing discovery in a few years but what “won’t” be discovered?”

_______________

Melinda_LevittMelinda Levitt (Partner, Foley & Lardner LLP): “Microsoft is certainly a giant of the electronic information world and in many, many ways it has completely changed the way that people communicate in the modern world. But, as an attorney who practices in the ediscovery “space” – I have seen no indication that Microsoft is a player, or has any real experience or expertise in this field. In matters in which we have been involved, we have had clients who bought tools that they were told would make preservation and collection easier to do and it could all be done in-house – and we have seen the significant flaws with those systems because they were not designed by attorneys – or the specialist litigation technician – who really understood, based on hands on experience over many years, the nuances of ediscovery and what is needed . . . what is responsive, or maybe responsive . . .  what may be confidential and worthy of protection, or what is not. Etc. Perhaps Microsoft has such people and is working with them to take us to a next generation of ediscovery/big data management – including most particularly managing enormous caches of emails and making them searchable with advanced analytics . . . . but without some indication that they understand the intricacies of what is involved – that special place where sophisticated legal skill, very specialized technical understanding, and the “art” of practicing law meet – then I remain skeptical and worried.”

______________

GregoryBufithisGregory P. Bufithis (Attorney & Managing Director, eTERA Consulting Europe): “I am intrigued by the Microsoft/Equivio tie up. My initial reaction was the deal makes sense given Nadella’s strategy for the “New Microsoft”, what he calls the “data analysis” Microsoft. And I agree with Ralph: it seems to be an admission by Microsoft that they do not have any real AI capacity as concerns document search. But I thought Microsoft would have gone after somebody else. The acquisition of Equivio is no surprise. From what I understand, they have had a book out on the street for the past 12+ months. It will be interesting if anyone else makes a play now.But as Ralph says, one’s first thought: is Microsoft really serious about playing in “our” legal sandbox? Just a few points:One point to immediately dismiss: the purchase price of $200 million being bandied about in the press. In the Bloomberg review, the analyst thought $200 million was far too high. I have been following the business press in Israel and the general take is the price will be “far lower”. But we have no way of knowing since we have not seen any press releases, no signed letter of intent being waved about. The 8-K disclosure requirements do not mandate the disclosure of letters of intent and other non-binding agreements so I doubt we’ll see the LOI via an S.E.C. filing by Microsoft. But if it is $200 million … way to go Amir!!

nadellaNadella [Microsoft CEO, Satya Nadella, shown right] has been trying hard to redefine his company for the post-Gates/Ballmer era. If you took the time to read that 3,100 word “positioning memo” he sent out over the summer to every Microsoft employee (and to the world in general) it’s all in there. But as a media guy who spends a lot of time in the digital biosphere, I have a short note to Mr Nadella: your memo was waaaaay too long with too many messages. Your troops either stopped reading it or just forgot it as soon as they scanned it. A video would have worked better at that length [point of reference: Tim Cook’s excellent video to his troops after taking command of Apple]. And Satya, I mean really: fake cheerleading and empty words like “synthesize” and “potential” and “revolution” will bring out all of the cynics. Like me.

Yet I found it a fascinating document for many reasons. Talk is cheap and Nadella has to produce and match his talk of “potential” and “synthesize” and “revolution” with “real substance”. You cannot talk your way into continued technology leadership. A press release is fine and we all love cultural revolutions (and, yes, I admit it: he does looks perfect in a shirt and jacket and jeans; Tim Cook would love to look that good) but he has a problem: he begins at the altar of innovation and for Microsoft that means a tradition of pretty much stealing technology, so Microsoft’s “tradition of innovation” is a bit hard to even detect, much less revive. …

Nadella has talked endlessly about Microsoft keeping important to personal and organizational productivity by emphasizing, it seems, the coordination of information in a world where users have multiple devices and there are a growing number of devices independent from any user. Oh, you know. That damn, that infernal Internet of Things (IoT). But an obvious problem: for the first time in a long time Microsoft is not a leader in any of this. Microsoft is just one of many companies in analytics and business intelligence. Yes, he sleeps a little better knowing Samsung must continue to pay him $1 billion+ a year in patent licenses because Samsung phone technology is dependent on Microsoft tech they patented but never did much with. At least on that score the old Microsoft … Bill Gates’ all embracing essence of Microsoft …. Was a bit innovative, establishing de facto standards. But while Windows is the top OS, it’s pretty much ignored in mobile and IoT.

Yes, Microsoft makes a boatload of money. But in Silicon Valley there are two sayings that everyone regards as truth. One is that profits follow relevance. The other is that there’s a difference between strategic position and financial position. It’s easy to be in denial and think the financials reflect the current reality. They do not. Around three-quarters of Microsoft’s profits come from the two fabulously successful products on which the company was built: the Windows operating system, which essentially makes personal computers run, and Office, the suite of applications that includes Word, Excel, and PowerPoint. Financially speaking, Microsoft is still extraordinarily powerful. In the last 12 months the company reported sales of $86+ billion and earnings of $22+ billion. It has $85+ billion cash on its balance sheet. But the company is facing a confluence of threats that is all the more staggering given Microsoft’s sheer size. Competitors such as Google and Apple have upended Microsoft’s business model, making it unclear where Windows will fit in the world, and even challenging Office.  …

Yes, Microsoft has a boatload of money and thousands of good employees but its management culture works against true innovation. Nadella figures he’ll “buy” that culture (Minecraft and Equivio being examples) to right the ship. So how does Equivio figure in this? A few points:

  1. Nadella is an engineer with advanced degrees in computer science. So he knows that clean logical code simply does not exist in some abstract conceptual space. It “plays” in a complexly shaped, intricately interacting digital information universe. We all know that having been in the e-discovery trenches. And Microsoft and Equivio have been in the trenches together. Equivio has been working with many Microsoft technologies … including Windows XP, SQL Server and SharePoint Server … since 2006, if not earlier. One thing the Microsoft reps noted at LegalTech this past year was that the integration of Equivio’s technology added an important layer of structure to SharePoint data repositories. One chap said “we have seen that it clearly has expedited a corporation’s response to e-discovery requests, internal investigations and regulatory tasks.” And we know from market chats that Microsoft and Equivio have talked about integrating Equivio technology into “Track Changes” and “Compare Documents” and other functionalities within Word. So we must assume that Nadella pretty much knows what he is buying. He now has the chance to weave together the “potential” and the “synthesis” and the “revolution” and the “real substance” he has been talking about. Granted, at a low level but a key one in keeping with the e-discovery model they have been building.
  2. Microsoft has been pushing machine learning in more of its products. I saw Nadella speak earlier this year and his whole focus was “data analysis” and “machine learning.” We know that machine learning traditionally requires complex software, high-end computers, and seasoned data scientists who understand it all. Nadella’s pitch has been that for many startups and even large enterprises “it’s simply too hard and expensive.” So he has moved to bring machine learning/predictive analytics to a more accessible level to a much broader audience. Equivio can help that.
  3. But the point (always in these acquisitions): can Microsoft execute? Can they integrate Equivio? The problem with Microsoft has been they lack consistency and perseverance. They always seem to be looking for quick success. Each Microsoft leadership comes in and implements his own stuff. And I will not even get into the infamous Gates/Ballmer internal wars after Ballmer took over.  … You only need to look at a company like Symantec to realize one can gobble up big and small vendors alike but you really need to integrate, market, and sell them. I have studied the M&A market. Rarely is an acquisition failure a function of buying bad companies/bad technologies. Almost every time it is a direct function of the inability to execute on a broad vision. I will use … yet again … my essay on the H-P acquisition of Autonomy back in 2011. HP wanted to make itself more like IBM, which had been successfully revived by Louis Gerstner. But Gerstner refused to act precipitously after taking over IBM. He had a plan, he had a vision and he and his team slowly showed their vision worked by establishing their operational credibility. They satisfied the Board, they satisfied the market, they satisfied the pundits. That has been something H-P and Symantec have never been able to do.
  4. And the biggie vis-à-vis Equivio: in so many cases Microsoft has acted the “rogue” takeover company, attacking companies to reduce their value then buying them up without the knowledgeable team in place. In the press it became known as “Microshaft”. It would take over a company and then every single person with a brain would immediately resign leaving the source code in the hands of interns and idiots. Will Nadella do the smart thing and keep the Equivio team? Otherwise … as an analysts said about a previous Microsoft acquisition … “they end up handing the new source code over to those without a clue, those who don’t care and they crate Windoze.”

… Will the electronic data discovery (EDD) herd be winnowed down? The EDD pundits say eventually there will be only 7-10 companies doing this. Up until this week I would have not put Microsoft (via Equivio) on that list. But nobody can accurately predict because we are so, so early in this game.” For more on Greg Bufithis’ many interesting thoughts on the deal, see the Project Counsel blog.

_______________

Rob-RobinsonRob Robinson (Managing Partner, ComplexDiscovery Solutions): “The rumored acquisition highlights the ever-increasing need for organizations to do more for information governance purposes than just store documents. Equivio will offer Microsoft enterprise customers additional ways to organize, cull, and work with their text-based documents. However, the combined offering may still leave organizations challenged in terms of being able to work effectively with non-text or poor-text documents – and in some industries like Oil & Gas, those types of documents can account for a significant percentage of entire collections. So bottom line: appears to be good progress for Microsoft in dealing with text-based documents, but still a need and opportunity for technologies dealing will all document formats.”

_______________

William_webberWilliam Webber (Information Scientist, eDiscovery Consultant): “It’s not the case that Microsoft lacks the capacity for applying AI to document search; indeed, the research group at Bing has been somewhat in advance of Google in applying machine learning approaches to web search. I think that a company like MS buys a company like Equivio not for their technology in the abstract (and while I think Equivio has been successful in finding the appropriate application of technology to e-discovery, I don’t think that their technology itself is all that revolutionary); rather, a MS would buy an Equivio as a path into an industry, first for having a concrete product, and second for having practical experience (and a customer base) in that industry. Is Equivio a good choice for MS on this basis? Well, I guess time will tell. My main query here would be that Equivio seem to have an offering narrowly focused on certain technologies, most specifically predictive coding, while leaving much of the surrounding work (processing, review, producing) to other products.  MS-Equivio have to decide whether they’re going to continue in this technological niche, or to expand to a full e-discovery management system. Staying in the technological niche seems an odd choice, and a potentially perilous one, as it assumes that full-featured providers will not be able to replicate Equivio’s technology offering on their own. But will an Equivio under MS ownership have the focus, drive, and industry understanding to expand their offering to a full e-discovery suite?”

_______________

barclay-t-blairBarclay T. Blair (President and founder of ViaLumina and the Executive Director and founder of the Information Governance Initiative): “In the past two weeks, three of the Information Governance’s Initiative’s vendor supporters have been involved in M&A transactions. Fontis International was purchased by Iron Mountain, and, according to some reports, Equivio is being acquired by Microsoft. These transactions are exciting to me as they provide the latest validation that the information governance market is taking hold. In the case of Fontis, Iron Mountain was attracted to the central IG rules engine that Fontis provides. In the case of Equivio, I can only surmise that Microsoft was not only interested in Equivio’s predictive coding offering for e-discovery, but also in the long-term growth opportunity represented by its productization of predictive coding for proactive management and remediation of content, another IG use case. I expect that we will see other transactions involving leading IG companies as this market matures and develops. This topic has come up in advisory sessions innumerable times in the past couple of quarters both with my IG provider clients and with investors. Hot acquisition targets are any company that provides automation of IG, or so-called auto-classification. Everybody is looking for it and the smart money realizes that there is a huge opportunity there for companies who can bring it to market in a way that is easy, cost-effective and that scales across multiple categories (a very different use case than e-discovery). There are dozens of small, specialized companies in that space. Remember it encompasses a much broader world than e-discovery. Clearly there will be massive consolidation as the big horizontal enterprise software companies move into this space. Enterprise software companies hear the same thing over and over from their customers: we need help managing the deluge of unstructured information. Even the new wave of content management vendors like Box, which claim to “not be your father’s content management system” are adding decidedly unsexy features like workflow and retention management.

The most important question moving forward is: who has the data? As more data flows out of company owned and operated data centers and into broad horizontal cloud environments like Google, Amazon, IBM, Microsoft, HP, and even Apple and “data lake” providers like Pivotal, and into vertical-specific hosted applications (construction management, sales management, manufacturing management), I believe we will see wide-scale adoption of things like industry-standard taxonomies, automated retention schedules, file plans, information protection programs, etc. The vendors who hold this data are starting to see and realize the opportunity to add increasingly valuable services on top of the data.”

Industry Insider Off-The-Record Comments

anonymousThe following are verbatim quotes, but without attribution. All I can say is that these insiders work for large organizations and do not have permission to speak on the record. But hey, I got them to speak! I will let you imagine who said what.

Anonymous One: “EDiscovery is data dialysis. We take the data out of the corporate body and scrub it, then re-inject the data into the enterprise requirements flow (in this case, production of responsive ESI). In the long run, if the corporate filters are functioning well (like a healthy liver) externalizing ESI to filter it appropriately will not be necessary. EDiscovery will be part of enterprise IT, inside the firewall. Microsoft apparently believes this.Microsoft only acquires 2.0 and above versions. Their faith that advanced analytics of ESI by Equivio are at the 2.0 level is good news for every predictive coding solutions provider.Predictive coding is now in play for enterprise IT. Look for more acquisition and integration.”

_________________

Anonymous Two: “Anomaly or harbinger? The discovery market still seems very fragmented and immature, and curiously so. If we were to start the clock around the time the 2006 amendments came into force and compare the discovery market to other markets, say in some technology sectors, the discovery market has remained very much immature. Just when there one thinks that the market is starting to rationalize, new entrants arrive. At times it feels like a game of whack-a-mole. Whether this market immaturity is a desirable or undesirable depends on your point of view. Put another way, how do the buyers of discovery services view a more consolidated market? Is it a good thing, as it may engender standardization and more efficient vendor management, or will it result in less competition in the market. I just don’t know.”

Economic theory most certainly teaches that, in the long run, the market will consolidate and mature but, then again, “in the long run we are all dead.” (A quotation attributed to John Maynard Keynes, I believe).”

________________

Anonymous Three: “There is no question that Microsoft was a pioneer in business software. But the kind of workflow automation software at the core of eDiscovery and information governance is not Microsoft’s area of expertise, which is likely why they’ve waited so long to enter this market.

The easiest way to close gaps in their information governance and eDiscovery portfolio is by acquiring technologies like Equivio, but that still leaves a lot of unanswered questions for their customers and the eDiscovery market as a whole. For example, all of the eDiscovery providers who license Equivio’s technology for predictive coding may find themselves in a major bind because they don’t own their own machine learning technology. It will be interesting to see how this move impacts companies like Kcura and their Relativity product.

On the customer front, Microsoft also needs to figure out how to handle data collection from laptops, desktops and systems beyond Exchange and Sharepoint. There are also a lot of lingering questions about Microsoft’s search limitations, scalability and integration between multiple on premise and cloud offerings that have to be answered.

Buying Equivio could help solve problems related to analytics, review and production, but they still have a long way to go to catch a lot of enterprise eDiscovery and information governance competitors who have a big head start.”

________________

Secret Shh!Stay tuned for Part Two of this blog next week where I will try to synthesize all of these great comments and bring it all to a conclusion. Are you an industry leader that would like to comment, but your words have to be undercover? Perhaps you are with Microsoft or Equivio? Or one of their competitors, where it might be unseemly to speak on the record, especially if it is not all politically correct? I will keep your identity secret and I never reveal my sources. Ready to talk openly, but I just failed to ask? My bad. Send me an email. I did not have time to ask all of the important folks that I wanted to hear from. I will try to include your remarks in Part Two.


What Can Happen When Lawyers Over Delegate e-Discovery Preservation and Search to a Client, and Three Kinds of “Ethically Challenged” Lawyers: “Slimy Weasels,” “Gutless,” and “Clueless”

September 21, 2014
Sergeant Schultz of Hogan's Heros

“I see nothing, NOTHING!” Sergeant Schultz

Bad things tend to happen when lawyers delegate e-discovery responsibility to their clients. As all informed lawyers know, lawyers have a duty to actively supervise their client’s preservation. They cannot just turn a blind eye; just send out written notices and forget it. Lawyers have an even higher duty to manage discovery, including search and production of electronic evidence. They cannot just turn e-discovery over to a client and then sign the response to the request for production. The only possible exception proves the rule. If a client has in-house legal counsel, and if they appear of record in the case, and if the in-house counsel signs the discovery response, then, and only then, is outside counsel (somewhat) off the hook. Then they can lay back, a little bit, but, trust me, this almost never happens.

To see a few of the bad things that can happen when lawyers over delegate e-discovery, you have only to look at a new district court opinion in Ohio. Brown v. Tellermate Holdings Ltd., No. 2:11-cv-1122 (S.D. Ohio July 1, 2014) (2014 WL 2987051 ). Severe sanctions were entered against the defendant because its lawyers were too laid back. The attorneys were personally sanctioned too, and ordered to pay the other side’s associated fees and costs.

The attorneys were sanctioned because they did not follow one of the cardinal rules of attorney-client relations in e-discovery, the one I call the Ronald Reagan Rule, as it is based on his famous remark concerning the nuclear arms treaty with the USSR: Trust but verify

The sanctioned attorneys in Brown trusted their client’s representations to them that they had fully preserved, that they had searched for the evidence. Do not get me wrong. There is nothing wrong with trusting your client, and that is not why they were sanctioned. They were sanctioned because they failed to go on to verify. Instead, they just accepted everything they were told with an uncritical eye. According to the author of the Brown opinion, U.S. Magistrate Judge Terence P. Kemp:

… significant problems arose in this case for one overriding reason: counsel fell far short of their obligation to examine critically the information which Tellermate [their client] gave them about the existence and availability of documents requested by the Browns. As a result, they did not produce documents in a timely fashion, made unfounded arguments about their ability and obligation to do so, caused the Browns to file discovery motions to address these issues, and, eventually, produced a key set of documents which were never subject to proper preservation. The question here is not whether this all occurred – clearly, it did – but why it occurred, and what, in fairness, the Court needs to do to address the situation which Tellermate and its attorneys have created.

Id. at pgs. 2-3 (emphasis added).

What is the Worst Kind of Lawyer?

slimy_weasel3Taking reasonable steps to verify can be a sticky situation for some lawyers. This is especially true for ethically challenged lawyers. In my experience lawyers like this generally come in three different varieties, all repugnant. Sometimes the lawyers just do not care about ethics. They are the slimy weasels among us. They can be more difficult to detect than you might think. They sometimes talk the talk, but never walk it, especially when the judge is not looking, or they think they can get away with it. I have run into many slimy weasel lawyers over the years, but still, I like to think they are rare.

cowardOther lawyers actually care about ethics. They know what they are doing is probably wrong, and it bothers them, at least somewhat. They understand their ethical duties, they also understand Rule 26(g), Federal Rules of Civil Procedure, but they just do not have the guts to fulfill their duties. They know its is wrong to simply trust the client’s response of no, we do not have that, but they do it anyway. They are gutless lawyers.

Often the gutless suffer from a combination of weak moral fibre and pocketbook pressures. They lack the economic independence to do the right thing. This is especially true in smaller law firms that are dependent on only a few clients to survive, or in siloed lawyers in a big firm without proper management. Such gutless lawyers may succumb to client pressures to save on fees and just let the client handle e-discovery. I have some empathy for such cowardly lawyers, but no respect. They often are very successful; almost as successful as the slimy weasels types that do not care at all about ethics.

ScarecrowThere is a third kind of lawyer, the ones who do not even know that they have a personal duty as an officer of the court to supervise discovery. They do not know that they have a personal duty in litigation to make reasonable, good faith efforts to try to ensure that evidence is properly preserved and produced. They are clueless lawyers. There are way too many of these brainless scarecrows in our profession.

I do not know which attorneys are worse. The clueless ones who are blissfully ignorant and do not even know that they are breaking bad by total reliance on their clients? Or the ones who know and do it anyway? Among the ones who know better, I am not sure who is worse either. Is it the slimy weasels who put all ethics aside when it comes to discovery, and are not too troubled about it. Or, is it the gutless lawyers, who know better, and do it anyway out of weak moral fortitude, usually amplified by economic pressures. All three of these lawyer types are dangerous, not only to themselves, and their clients, but to the whole legal system. So what do you think? Please fill out the online poll below and tell us which kind of lawyer you think is the worst.

 

I will not tell you how I voted, but I will share my personal message to each of the three types. There are not many slimy weasels who read my blog, but I suspect there may be a few. Be warned. I do not care how powerful and protected you think you are. If I sniff you out, I will come after you. I fear you not. I will expose you and show no mercy. I will defeat you. But, after the hearing, I will share a drink with some of you. Others I will avoid like the plague. Evil comes in many flavors and degrees too. Some slimy weasel lawyers are charming social engineers, and not all bad. The admissions they sometimes make to try to gain your trust can be especially interesting. I protect the confidentiality of their off-the-record comments, even though I know they would never protect mine. Those are the rules of the road in dancing with the devil.

The-devil-s-advocate

As to the gutless, and I am pretty sure that a few of my readers fall into that category, although not many. To you I say: grow a spine. Find your inner courage. You cannot take money and things with you when you die. So what if you fail financially? So what if you are not a big success? It is better to sleep well. Do the right thing and you will never regret it. Your family will not starve. Your children will respect you. You will be proud to have them follow in your footsteps, not ashamed. I will not have drinks with gutless lawyers.

As to the clueless, and none of my readers by definition fall into that category, but I have a message for you nonetheless: wake up, your days are numbered. There are at least three kinds of clueless lawyers and my attitude towards each is different. The first kind is so full of themselves that they have no idea they are clueless. I will not have drinks with these egomaniacs. The second type has some idea that they may need to learn more about e-discovery. They may be clueless, but they are starting to realize it. I will share drinks with them. Indeed I will try very hard to awaken them from their ethically challenged slumber. The third kind is like the first, except that they know they are clueless and they are proud of it. They brag about not knowing how to use a computer. I will not have drinks with them. Indeed, I will attack them and their stone walls almost as vigorously as the weasels.

Judges Dislike the Clueless, Gutless, and Slimy Weasels

Judges dislike all three kinds of ethically challenged lawyers. That is why I was not surprised by Judge Kemp’s sanction in Brown of both the defendant and their attorneys. (By the way, I know nothing about defense counsel in this case and have no idea which category, if any, they fall into.) Here is how Judge Kemp begins his 47 page opinion.

There may have been a time in the courts of this country when building stone walls in response to discovery requests, hiding both the information sought and even the facts about its existence, was the norm (although never the proper course of action). Those days have passed. Discovery is, under the Federal Rules of Civil Procedure, intended to be a transparent process. Parties may still resist producing information if it is not relevant, or if it is privileged, or if the burden of producing it outweighs its value. But they may not, by directly misrepresenting the facts about what information they have either possession of or access to, shield documents from discovery by (1) stating falsely, and with reckless disregard for the truth, that there are no more documents, responsive or not, to be produced; or (2) that they cannot obtain access to responsive documents even if they wished to do so. Because that is the essence of what occurred during discovery in this case, the Court has an obligation to right that wrong, and will do so in the form of sanctions authorized by Fed. R. Civ. P. 37.

Take these words to heart. Make all of the attorneys in your firm read them. There are probably a few old school types in your firm where you should post the quote on their office wall, no matter which type they are.

Brown v. Tellermate Holdings Ltd.

Judge_KempThe opinion in Brown v. Tellermate Holdings Ltd., No. 2:11-cv-1122 (S.D. Ohio July 1, 2014) (2014 WL 2987051) by U.S. Magistrate Judge Terence Kemp in Columbus, Ohio, makes it very clear that attorneys are obligated to verify what clients tell them about ESI. Bottom line – the court held that defense counsel in this single plaintiff, age discrimination case:

… had an obligation to do more than issue a general directive to their client to preserve documents which may be relevant to the case. Rather, counsel had an affirmative obligation to speak to the key players at [the defendant] so that counsel and client together could identify, preserve, and search the sources of discoverable information.

Id. at pg. 35.

In Brown the defense counsel relied on representations from their client regarding the existence of performance data within a www.salesforce.com database and the client’s ability to print summary reports. The client’s representations were incorrect and, according to the court, had counsel properly scrutinized the client’s representations, they would have uncovered the inaccuracies.

As mentioned, both defendant and its counsel were sanctioned. The defendant was precluded from using any evidence that would tend to show that the plaintiffs were terminated for performance-related reasons. This is a very serious sanction, which is, in some ways, much worse than an adverse inference instruction. In addition, both the defendant and its counsel were ordered to jointly reimburse plaintiffs the fees and costs they incurred in filing and prosecuting multiple motions to compel various forms of discovery. I hope it is a big number.

The essence of the mistake made by defense counsel in Brown was to trust, but not verify. They simply accepted their client’s statements. They failed to do their own due diligence. Defense counsel aggravated their mistake by a series of over aggressive discovery responses and argumentative positions, including such things as over-designation of AEO confidentiality, a document dump, failure to timely log privileged ESI withheld, and refusal to disclose search methods used.

The missteps of defense counsel are outlined in meticulous detail in this 47 page opinion by Judge Terence Kemp. In addition to the great quotes above, I bring the following quotes to your attention. Still, I urge you to read the whole opinion, and more importantly, to remember its lessons the next time a client does not want you to spend the time and money to do your job and verify what the client says. This opinion is a reminder for all of us to exercise our own due diligence and, at the same time, to cooperate in accord with your professional duties. An unsophisticated client might not always appreciate that approach, but, it is in their best interests, and besides, as lawyers and officers of the court, we have no choice.

[when e-discovery is involved] Counsel still have a duty (perhaps even a heightened duty) to cooperate in the discovery process; to be transparent about what information exists, how it is maintained, and whether and how it can be retrieved; and, above all, to exercise sufficient diligence (even when venturing into unfamiliar territory like ESI) to ensure that all representations made to opposing parties and to the Court are truthful and are based upon a reasonable investigation of the facts.

 Id. at Pg. 3.

As this Opinion and Order will explain, Tellermate’s counsel:

- failed to uncover even the most basic information about an electronically-stored database of information (the “salesforce.com” database);

- as a direct result of that failure, took no steps to preserve the integrity of the information in that database;

- failed to learn of the existence of certain documents about a prior age discrimination charge (the “Frank Mecka matter”) until almost a year after they were requested;

- and, as a result of these failures, made statements to opposing counsel and in oral and written submissions to the Court which were false and misleading, and which had the effect of hampering the Browns’ ability to pursue discovery in a timely and cost-efficient manner (as well as the Court’s ability to resolve this case in the same way).

These are serious matters, and the Court does not reach either its factual or its legal conclusions in this case lightly.

Id. at pg. 4.

In addition to the idea that discovery is broad and is designed to permit parties to obtain enough evidence either to prove their claims or disprove the opposing party’s claim, discovery under the Federal Rules of Civil Procedure has been designed to be a collaborative process. As one Court observed,

It cannot seriously be disputed that compliance with the “spirit and purposes” of these discovery rules requires cooperation by counsel to identify and fulfill legitimate discovery needs, yet avoid seeking discovery the cost and burden of which is disproportionally large to what is at stake in the litigation. Counsel cannot “behave responsively” during discovery unless they do both, which requires cooperation rather than contrariety, communication rather than confrontation.

Mancia v. Mayflower Textile Servs. Co., 253 F.R.D. 354, 357-58 (D. Md. 2008). Such a collaborative approach is completely consistent with a lawyer’s duty to represent his or her client zealously. See Ruiz-Bueno v. Scott, 2013 WL 6055402, *4 (S.D. Ohio Nov. 15, 2013). It also reflects a duty owed to the court system and the litigation process.

Id. at pgs. 28-29. Also see: Losey, R. Mancia v. Mayflower Begins a Pilgrimage to the New World of Cooperation, 10 Sedona Conf. J. 377 (2009 Supp.).

Tellermate, as an entity, knew that every statement it made about its control over, and ability to produce, the salesforce.com records was not true when it was made. It had employees who could have said so – including its salesforce.com administrators – had they simply been asked. Its representations were illogical and were directly contradicted by the Browns, who worked for Tellermate, had salesforce.com accounts, and knew that Tellermate could access those accounts and the information in them. And yet Tellermate’s counsel made these untrue statements repeatedly, in emails, letters, briefs, and during informal conferences with the Court, over a period of months, relenting only when the Court decided that it did not believe what they were saying. This type of behavior violated what has been referred to as “the most fundamental responsibility” of those engaged in discovery, which is “to provide honest, truthful answers in the first place and to supplement or correct a previous disclosure when a party learns that its earlier disclosure was incomplete or incorrect.” Lebron v. Powell, 217 F.R.D. 72, 76 (D.D.C. 2003). “The discovery process created by the Federal Rules of Civil Procedure is premised on the belief or, to be more accurate, requirement that parties who engage in it will truthfully answer their opponents’ discovery requests and  consistently correct and supplement their initial responses.” Id. at 78. That did not happen here.

Id. at pg. 31.

But it is not fair to place the entire blame on Tellermate, even if it must shoulder the ultimate responsibility for not telling counsel what, collectively, it knew or should have known to be the truth about its ability to produce the salesforce.com information. As this Court said in Bratka, in the language quoted above at page 3, counsel cannot simply take a client’s representations about such matters at face value. After all, Rule 26(g) requires counsel to sign discovery responses and to certify their accuracy based on “a reasonable inquiry” into the facts. And as Judge Graham (who is, coincidentally, the District Judge presiding over this case as well, and whose views on the obligations of counsel were certainly available to Ms. O’Neil and Mr. Reich), said in Bratka, 164 F.R.D. at 461:

The Court expects that any trial attorney appearing as counsel of record in this Court who receives a request for production of documents in a case such as this will formulate a plan of action which will ensure full and fair compliance with the request. Such a plan would include communicating with the client to identify the persons having responsibility for the matters which are the subject of the discovery request and all employees likely to have been the authors, recipients or custodians of documents falling within the request. The plan should ensure that all such individuals are contacted and interviewed regarding their knowledge of the existence of any documents covered by the discovery request, and should include steps to ensure that all documents within their knowledge are retrieved. All documents received from the client should be reviewed by counsel to see whether they indicate the existence of other documents not retrieved or the existence of other individuals who might have documents, and there should be appropriate follow up. Of course, the details of an appropriate document search will vary, depending upon the circumstances of the particular case, but in the abstract the Court believes these basic procedures should be employed by any careful and conscientious lawyer in every case.

 Id. at pgs. 32-33.

Like any litigation counsel, Tellermate’s counsel had an obligation to do more than issue a general directive to their client to preserve documents which may be relevant to the case. Rather, counsel had an affirmative obligation to speak to the key players at Tellermate so that counsel and client together could identify, preserve, and search the sources of discoverable information. See Cache La Poudre Feeds, LLC v. Land O’ Lakes, Inc., 244 F.R.D. 614, 629 (D. Colo. 2007). In addition, “counsel cannot turn a blind eye to a procedure that he or she should realize will adversely impact” the search for discovery. Id. Once a “litigation hold” is in place, “a party cannot continue a routine procedure that effectively ensures that potentially relevant and readily available information is no longer ‘reasonably accessible’ under Rule 26(b)(2)(B).” Id.

Id. at pg. 35.

As noted above, Tellermate and its counsel also made false representations to opposing counsel and the Court concerning the existence of documents relating to the Frank Mecka matter. Indeed, at the hearing on the pending motions, Tellermate’s counsel stated that she was unaware of the existence of the great majority of the Frank Mecka documents until almost a year after they were requested. Once again, it is not sufficient to send the discovery request to a client and passively accept whatever documents and information that client chooses to produce in response. See Cache La Poudre Feeds, 244 F.R.D. at 629.

 Id. at pg. 37 (emphasis added).

There are two distinct but related problems with trying to remedy Tellermate’s failings concerning these documents. The first is the extremely serious nature of its, and counsel’s, strenuous efforts to resist production of these documents and the strident posture taken with both opposing counsel and the Court. Perhaps the most distressing aspect of the way in which this was litigated is how firmly and repeatedly counsel represented Tellermate’s inability to produce these documents coupled with the complete absence of Tellermate’s compliance with its obligation to give counsel correct information, and counsel’s complete abdication of the responsibilities so well described by this Court in Bratka. At the end of the day, both Tellermate’s and its counsel’s actions were simply inexcusable, and the Court has no difficulty finding that they were either grossly negligent or willful acts, taken in objective bad faith.

Id. at pg. 43.

The only realistic solution to this problem is to preclude Tellermate from using any evidence which would tend to show that the Browns were terminated for performance-related reasons. … This sanction is commensurate with the harm caused by Tellermate’s discovery failures, and is also warranted to deter other similarly-situated litigants from failing to make basic, reasonable inquiries into the truth of representations they make to the Court, and from failing to take precautions to prevent the spoliation of evidence. It serves the main purposes of Rule 37 sanctions, which are to prevent parties from benefitting from their own misconduct, preserving the integrity of the judicial process, and deterring both the present litigants, and other litigants, from engaging in similar behavior.

Id. at pg. 45.

Of course, it is also appropriate to award attorneys’ fees and costs which the Browns have incurred in connection with moving to compel discovery concerning the salesforce.com documents and the Mecka documents, and those fees and expenses incurred in filing and prosecuting the motion for sanctions and the motion relating to the attorneys-eyes-only documents. … Finally, Tellermate and its counsel shall pay, jointly, the Browns’ reasonable attorneys’ fees and costs incurred in the filing and prosecution of those two motions as well as in the filing of any motions to compel discovery relating to the salesforce.com and Frank Mecka documents.

Id. at pgs. 45-46.

So sayeth the Court.

 Conclusion

obligatory iPhone Selfie jazzed up with ink strokes effectsThe defendant’s law firm here did a disservice to their clients by not pushing back, and by instead simply accepting their clients’ report on what relevant ESI they had, or did not have. Defense counsel cannot do that. We have a responsibility to supervise discovery, especially complex e-discovery, and be proactive in ESI preservation. This opinion shows what happens when a firm chooses not to be diligent. The client loses and the lawyers are sanctioned.

Our obligation as attorneys of record does not end with the client’s sending a litigation hold notice. If a client tells us something regarding the existence, or more pointedly, the non-existence, of electronically stored information that does not make sense, or seemingly is contradicted by other evidence, it is critical for an attorney to investigate further. The client may not want you to do that, but it is in the client’s best interests that you do so. The case could depend upon it. So could your license to practice law, not to mention your reputation as a professional. It is never worth it. It is far better to sleep well at night with a clear conscience, even if it sometimes means you lose a client, or are generally not as successful, or rich, as the few ethically challenged lawyers who appear to get away with it.


Caveat Emptor – Beware of Vendor Trickery

September 18, 2014

In a crowd of e-Discovery Vendors, where each claims to have the Best Software

HOW CAN YOU KNOW WHO IS TELLING THE TRUTH?

Watch this short video animation below for one answer to that question, and yes, this is somewhat self-promotional, but still true.

???????????????

____________

___________

Only trust independent expert commentators and peer reviewed scientific experiments.

 _____

A full blog on lawyer ethics and an important new case on diligence is coming on this blog soon.


Guest Blog: Talking Turkey

September 7, 2014

Maura-and-Gordon_Aug2014EDITORS NOTE: This is a guest blog by Gordon V. Cormack, Professor, University of Waterloo, and Maura R. Grossman, Of Counsel, Wachtell, Lipton, Rosen & Katz. The views expressed herein are solely those of the authors and should not be attributed to Maura Grossman’s law firm or its clients. 

This guest blog constitutes the first public response by Professor Cormack and Maura Grossman, J.D., Ph.D., to articles published by one vendor, and others, that criticize their work. In the Editor’s opinion the criticisms are replete with misinformation and thus unfair. For background on the Cormack Grossman study in question, Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic DiscoverySIGIR’14, July 6–11, 2014, and the Editor’s views on this important research seeLatest Grossman and Cormack Study Proves Folly of Using Random Search For Machine Training – Part One and Part Two and Part Three. After remaining silent for some time in the face of constant vendor potshots, Professor Cormack and Dr. Grossman feel that a response is now necessary. They choose to speak at this time in this blog because, in their words:

We would have preferred to address criticism of our work in scientifically recognized venues, such as academic conferences and peer-reviewed journals. Others, however, have chosen to spread disinformation and to engage in disparagement through social media, direct mailings, and professional meetings. We have been asked by a number of people for comment and felt it necessary to respond in this medium.

___________________

Guest Blog:  TALKING TURKEY

OrcaTec, the eDiscovery software company started by Herbert L. Roitblat, attributes to us the following words at the top of its home page: “Not surprisingly, costs of predictive coding, even with the use of relatively experienced counsel for machine-learning tasks, are likely to be substantially lower than the costs of human review.” These words are not ours. We neither wrote nor spoke them, although OrcaTec attributes them to our 2011 article in the Richmond Journal of Law and Technology (“JOLT article”).

[Ed. Note: The words were removed shortly after blog was published.]

Oratec_Grossman_quote

 

A series of five OrcaTec blog posts (1, 2, 3, 4, 5) impugning our 2014 articles in SIGIR and Federal Courts Law Review (“2014 FCLR article”) likewise misstates our words, our methods, our motives, and our conclusions. At the same time, the blog posts offer Roitblat’s testimonials—but no scientific evidence—regarding the superiority of his, and OrcaTec’s, approach.

As noted in Wikipedia, a straw man is a common type of argument and is an informal fallacy based on the misrepresentation of an opponent’s argument. To be successful, a straw man argument requires that the audience be ignorant or uninformed of the original argument.”  First and foremost, we urge readers to avoid falling prey to Roitblat’s straw man by familiarizing themselves with our articles and what they actually say, rather than relying on his representations as to what they say.  We stand by what we have written.

Second, we see no reason why readers should accept Roitblat’s untested assertions, absent validation through the scientific method and peer review. For example, Roitblat claims, without providing any scientific support, that:

These claims are testable hypotheses, the formulation of which is the first step in distinguishing science from pseudo-science; but Roitblat declines to take the essential step of putting his hypotheses to the test in controlled studies.

Overall, Roitblat’s OrcaTec blog posts represent a classic example of truthiness. In the following paragraphs, we outline some of the misstatements and fallacious arguments that might leave the reader with the mistaken impression that Roitblat’s conclusions have merit.

With Us or Against Us?

Our JOLT article, which OrcaTec cites approvingly, concludes:

Overall, the myth that exhaustive manual review is the most effective—and therefore, the most defensible—approach to document review is strongly refuted. Technology-assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort.  Of course, not all technology-assisted reviews (and not all manual reviews) are created equal. The particular processes found to be superior in this study are both interactive, employing a combination of computer and human input.  While these processes require the review of orders of magnitude fewer documents than exhaustive manual review, neither entails the naïve application of technology absent human judgment. Future work may address which technology-assisted review process(es) will improve most on manual review, not whether technology-assisted review can improve on manual review (emphasis added; original emphasis in bold).

The particular processes shown to be superior, based on analysis of the results of the Interactive Task of the TREC 2009 Legal Track, were an active learning method employed by the University of Waterloo, and a rule-based method employed by H5. Despite the fact that OrcaTec chose not to participate in TREC, and their method—which employs neither active learning nor a rule base—is not one of those shown by our study to be superior, OrcaTec was quick to promote TREC and our JOLT article as scientific evidence for the effectiveness of their method.

Oratec _quote_Grossman

In his OrcaTec blog posts following the publication of our SIGIR and 2014 FCLR articles, however, Roitblat espouses a different view. In Daubert, Rule 26(g) and the eDiscovery Turkey, he states that the TREC 2009 data used in the JOLT and SIGIR studies “cannot be seen as independent in any sense, in that the TREC legal track was overseen by Grossman and Cormack.” Notwithstanding his argumentum ad hominem, the coordinators of the TREC 2009 Legal Track included neither of us.  Cormack was a TREC 2009 participant, who directed the Waterloo effort, while Grossman was a “Topic Authority,” who neither knew Cormack at the time, nor had any role in assessing the Waterloo effort. It was not until 2010, that Cormack and Grossman became Legal Track coordinators.

TREC Overview

Roitblat’s change of perspective perhaps owes to the fact that our SIGIR article is critical of random training for technology-assisted review (“TAR”), and our 2014 FCLR article is critical of “eRecall,” both methods advanced by Roitblat and employed by OrcaTec. But nothing about TREC 2009 or our JOLT study has changed in the intervening years, and the OrcaTec site continues—even at the time of this writing—to (mis)quote our work as evidence of OrcaTec’s effectiveness, despite Roitblat’s insistence that OrcaTec bears no resemblance to anything we have tested or found to be effective. The continuous active learning (“CAL”) system we tested in our SIGIR study, however, does resemble the Waterloo system shown to be more effective than manual review in our JOLT study. If OrcaTec bears no resemblance to the CAL system—or indeed, to any of the others we have tested—on what basis has OrcaTec cited TREC 2009 and our JOLT study in support of the proposition that their TAR tool works?

Apples v. Oranges

gaincurveContrary to the aphorism, “you can’t compare apples to oranges,” you certainly can, provided that you use a common measure like weight in pounds, price in dollars per pound, or food energy in Calories. Roitblat, in comparing his unpublished results to our peer-reviewed results, compares the shininess of an apple in gloss units with the sweetness of an orange in percent sucrose equivalent. The graph above, reproduced from the first of the five Roitblat blogs, shows three dots placed by Roitblat over four “gain curves” from our SIGIR article. Roitblat states (emphasis added): 

The x-axis shows the number of training documents that were reviewed. The y-axis shows the level of Recall obtained.

This may be true for Roitblat’s dots, but for our gain curves, on which his dots are superimposed, the x-axis shows the total number of documents reviewed, including both the training and review efforts combined.  Dots on a graph reflecting one measure, placed on top of curves reflecting a different measure, convey no more information than paintball splats.

paintball3For OrcaTec’s method, the number of training documents is tiny compared to the number of documents identified for subsequent review. Small wonder the dots are so far to the left. For a valid comparison, Roitblat would have to move his dots way to the right to account for the documents subject to subsequent review, which he has disregarded. Roitblat does not disclose the number of documents identified for review in the matters reflected by his three dots. We do know, however, that in the Global Aerospace case, OrcaTec was reported to achieve 81% recall with 5,000 training documents, consistent with the placement of Roitblat’s green dot. We also know that roughly 173,000 documents were identified for second-pass review. Therefore, in an apples-to-apples comparison with CAL, a dot properly representing Global Aerospace would be at the same height as the green dot, but 173,000 places farther to the right—far beyond the right edge of Roitblat’s graph.

Of course, even if one were to compare using a common measure, there would be little point, due to the number of uncontrolled differences between the situations from which the dots and gain curves were derived. Only a valid, controlled comparison can convey any information about the relative effectiveness of the two approaches.

Fool’s Gold?

In The Science of Comparing Learning Protocols—Blog Post II on the Cormack & Grossman Article, Roitblat seeks to discredit our SIGIR study so as to exempt OrcaTec from its findings. He misrepresents the context of our words in the highlighted quote below, claiming that they pertain to the “gold standard” we used for evaluation:

Here I want to focus on how the true set, the so-called “gold standard” was derived for [four of the eight] matters [Cormack and Grossman] present. They say that for the “true” responsiveness values “for the legal-matter-derived tasks, we used the coding rendered by the first-pass reviewer in the course of the review. Documents that were never seen by the first-pass reviewer (because they were never identified as potentially responsive) were deemed to be coded as non-responsive” (emphasis added).

As may be seen from our SIGIR article at page 155, the words quoted above do not refer to the gold standard at all, but to a deliberately imperfect “training standard” used to simulate human review. Our gold standard used a statistical sampling technique for the entire collection known as the Horvitz-Thompson estimator; a technique that has gained widespread acceptance in the scientific community since its publication, in 1952, in the Journal of the American Statistical Association.

Apparently, to bolster his claims, Roitblat also provides a column of numbers titled “Precision,” on the right side of the table reproduced below.

Table_3_Grossman

We have no idea where these numbers came from—since we did not report precision in our SIGIR article—but if these numbers are intended to reflect the precision achieved by the CAL process at 90% recall, they are simply wrong. The correct numbers may be derived from the information provided in Table 1 (at page 155) and Figure 1 (at page 157) of our SIGIR article.

While we make no claim that our study is without limitations (see Section 7.5 at page 161 of our SIGIR article), Roitblat’s special pleading regarding the real or imagined limitations of our study provides no support for his claim that random training (using the OrcaTec tool in particular) achieves superior results to active learning. If Roitblat believes that a different study would show a contrary result to ours, he should conduct such a study, and submit the results for peer review.

Outcomnes_Toolkit_GrossmanAlthough we have been described by Roitblat as “CAR vendors” with a “vested interest in making their algorithm appear better than others,” we have made freely available our TAR Evaluation Toolkit, which contains the apparatus we used to conduct our SIGIR study, including the support vector machine (“SVM”) learning algorithm, the simulation tools, and four of the eight datasets. Researchers are invited to reproduce our results—indeed, we hope, to improve on them—by exploring other learning algorithms, protocols, datasets, and review tasks. In fact, in our SIGIR article at page 161, we wrote: 

There is no reason to presume that the CAL results described here represent the best that can be achieved. Any number of feature engineering methods, learning algorithms, training protocols, and search strategies might yield substantive improvements in the future.

Roitblat could easily use our toolkit to test his claims, but he has declined to do so, and has declined to make the OrcaTec tool available for this purpose. We encourage other service providers to use the toolkit to evaluate their TAR tools, and we encourage their clients to insist that they do, or to conduct or commission their own tests. The question of whether Vendor X’s tool outperforms the free software we have made available is a hypothesis that may be tested, not only for OrcaTec, but for every vendor.

Since SIGIR, we have expanded our study to include the 103 topics of the RCV1-v2 dataset, with prevalences ranging from 0.0006% (5 relevant documents in 804,414) to 47.4% (381,000 relevant documents in 804,414). We used the SVMlight tool and word-based tf-idf tokenization strategy that the RCV1-v2 authors found to be most effective. We used the topic descriptions, provided with the dataset, as keyword “seed queries.” We used the independent relevance assessments, also provided with the dataset, as both the training and gold standards. The results—on 103 topics—tell the same story as our SIGIR paper, and will appear—once peer reviewed—in a forthcoming publication.

Straw Turkey

Straw_turkeyWe were dumbfounded by Roitblat’s characterization of our 2014 FCLR article:

Schieneman and Gricks argue that one should measure the outcome of eDiscovery efforts to assess their reasonableness, and Grossman and Cormack argue that such measurement is unnecessary under certain conditions.

What we actually wrote was:

[Schieneman and Gricks’] exclusive focus on a particular statistical test, applied to a single phase of a review effort, does not provide adequate assurance of a reasonable production, and may be unduly burdensome. Validation should consider all available evidence concerning the effectiveness of the end-to-end review process, including prior scientific evaluation of the TAR method, its proper application by qualified individuals, and proportionate post hoc sampling for confirmation purposes (emphasis added).

Roitblat doubles down on his strawman, asserting that we eschew all measurement, insisting that our metaphor of cooking a turkey is inconsistent with his false characterization of our position. We have never said—nor do we believe—that measurement is unnecessary for TAR. In addition to pointing out the necessity of ensuring that the method is sound and is properly applied by qualified individuals, we state (at page 312 of our 2014 FCLR article) that it is necessary to ensure “that readily observable evidence—both statistical and non-statistical—is consistent with the proper functioning of the method.”

The turkey-cooking metaphor appears at pages 301-302 of our 2014 FCLR article:

When cooking a turkey, one can be reasonably certain that it is done, and hence free from salmonella, when it reaches a temperature of at least 165 degrees throughout. One can be reasonably sure it has reached a temperature of at least 165 degrees throughout by cooking it for a specific amount of time, depending on the oven temperature, the weight of the turkey, and whether the turkey is initially frozen, refrigerated, or at room temperature. Alternatively, when one believes that the turkey is ready for consumption, one may probe the turkey with a thermometer at various places. Both of these approaches have been validated by biological, medical, and epidemiological evidence. Cooking a turkey requires adherence, by a competent cook, to a recipe that is known to work, while observing that tools like the oven, timer, and thermometer appear to behave properly, and that the appearance, aroma, and texture of the turkey turn out as expected. The totality of the evidence—vetting the method in advance, competently and diligently applying the method, and monitoring observable phenomena following the application of the method—supports the reasonable conclusion that dinner is ready.

Roitblat reproduces our story, and then argues that it is inconsistent with his mischaracterization of our position:

They argue that we do not need to measure the temperature of the turkey in order to cook it properly, that we can be reasonably sure if we roast a turkey of a specific weight and starting temperature for a specific time at a specific oven temperature. This example is actually contrary to their position. Instead of one measure, using a meat thermometer to assess directly the final temperature of the meat, their example calls on four measures: roasting time, oven temperature, turkey weight, and the bird’s starting temperature to guess at how it will turn out. . . .  To be consistent with their argument, they would have to claim that we would not have to measure anything, provided that we had a scientific study of our oven and a qualified chef to oversee the cooking process.

Cooked_TurkeyIn our story, the turkey chef would need to ensure—through measurement and other observations—that the turkey was properly cooked, in order to avoid the risk of food poisoning. The weight of most turkeys sold in the U.S. is readily observable on the FDA label because it has been measured by the packer, and it is reasonable to trust that information. At the same time, a competent chef could reasonably be expected to notice if the label information were preposterous; for example, six pounds for a full-sized turkey. If the label were missing, nothing we have ever said would even remotely suggest that the chef should refrain from weighing the turkey with a kitchen scale—assuming one were available—or even a bathroom scale, if the alternative was for everyone to go hungry. Similarly, if the turkey were taken from a functioning refrigerator, and were free of ice, a competent chef would know the starting temperature with a margin of error that is inconsequential to the cooking time. Any functioning oven has a thermostat that measures and regulates its temperature. It is hard to imagine our chef having no ready access to some sort of timepiece with which to measure cooking time. Moreover, many birds come with a built-in gizmo that measures the turkey’s temperature and pops up when the temperature is somewhat more than 165 degrees. It does not display the temperature at all, let alone with a margin of error and confidence level, but it can still provide reassurance that the turkey is done. We have never suggested that the chef should refrain from using the gizmo, but if it pops up after one hour, or the turkey has been cooking for seven hours and it still has not popped up, they should not ignore the other evidence. ThermoProbeAnd, if the gizmo is missing when the turkey is unwrapped, our chef can still cook dinner without running out to buy a laboratory thermometer. The bottom line is that there are many sources of evidence—statistical and otherwise—that can tell us whether a TAR process has been reasonable.

Your Mileage May Vary

Crash_testRoitblat would have us believe that science has no role to play in determining which TAR methods work, and which do not. In his fourth blog post, Daubert, Rule 26(g) and the eDiscovery Turkey, he argues that there are too many “[s]ources of variability in the eDiscovery process”; that every matter and every collection is different, and that “[t]he system’s performance in a ‘scientific study’ provides no information about any of these sources of variability. . . .” The same argument could be made about crash testing or EPA fuel economy ratings, since every accident, every car, every road, and every driver is also different.

The EPA’s infamous disclaimer, “your mileage may vary,” captures the fact that it is impossible to predict with certainty the fuel consumption of a given trip. But it would be very difficult indeed to find a trip for which a Toyota Prius consumed more fuel than a Hummer H1. And it would be a very good bet that, for your next trip, you would need less gas if you chose the Prius.

Manufacturers generally do not like controlled comparisons, because there are so few winners and so many also-rans. So it is with automobiles, and so it is with eDiscovery software. On the other hand, controlled comparisons help consumers and the courts to determine which TAR tools are reliable.

We have identified more than 100 instances—using different data collections with different prevalences, different learning algorithms, and different feature engineering methods—in which controlled comparison demonstrates that continuous active learning outperforms simple passive learning, and none in which simple passive learning prevails. Neither Roitblat, nor anyone else that we are aware of, has yet identified an instance in which OrcaTec prevails, in a controlled comparison, over the CAL implementation in our toolkit.

Illusion

In his fifth blog post, Daubert, Rule 26(g) and the eDiscovery Turkey: Tasting the eDiscovery Turkey, Part 2, Roitblat first claims that “[g]ood estimates of Recall can be obtained by evaluating a few hundred documents rather than the many thousands that could be needed for traditional measures of Recall,” but later admits that eRecall is a biased estimate of recall, “like a clock that runs a little fast or slow.” Roitblat further admits, “eRecall has a larger confidence interval than directly measured Recall because it involves the ratio of two random samples.” Roitblat then wonders “why [we] think that it is necessary to assume that the two measures [eRecall and the “direct method” of estimating recall] have the same confidence interval [(i.e., margin of error)].”

Our assumption came from representations made by Roitblat in Measurement in eDiscovery—A Technical White Paper:

Rather than exhaustively assessing a large random sample of thousands of documents [as required by the direct method], with the attendant variability of using multiple reviewers, we can obtain similar results by taking advantage of the fact that we have identified putatively responsive and putatively non-responsive documents. We use that information and the constraints inherent in the contingency table to evaluate the effectiveness of our process. Estimating Recall from Elusion can be called eRecall (emphasis added).

Our “mistake” was in taking Roitblat’s use of “similar results” to imply that an estimate of recall using eRecall would have a similar accuracy, margin of error, and confidence level to one obtained by the direct method; that is, unbiased, with a margin of error of ±5%, and a confidence level of 95%.

eRecall misses this mark by a long shot. If you set the confidence level to 95%, the margin of error achieved by eRecall is vastly larger than ±5%. Alternatively, if you set the margin of error to ±5%, the confidence level is vastly inferior to 95%, as illustrated below.

Table 2 at page 309 of our 2014 FCLR article (reproduced below) shows the result of repeatedly using eRecall, the direct method, and other methods to estimate recall for a review known to have achieved 75% recall and 83% precision, from a collection with 1% prevalence.

Table_2_recall

To achieve a margin of error of ±5%, at the 95% confidence level, the estimate must fall between 70% and 80% (±5% of the true value) at least 95% of the time. From the fourth column of the table one can see that the direct method falls within this range 97.5% of the time, exceeding the standard for 95% confidence. eRecall, on the other hand, falls within this range a mere 8.9% of the time. If the recall estimate had been drawn at random from a hat containing all estimates from 0% to 100%, the result would have fallen within the required range 10% of the time—more often than eRecall. Therefore, for this review, eRecall provides an estimate that is no better than chance.

Missed_targetHow large does the margin of error need to be for eRecall to achieve a 95% confidence level? The fifth and sixth columns of the table show that one would need to enlarge the target range to include all values between 0% and 100%, for eRecall to be able to hit the target 95% of the time. In other words, eRecall provides no information whatsoever about the true recall of this review, at the 95% confidence level. On the other hand, one could narrow the target range to include only the values between 70.6% and 79.2%, and the direct method would still hit it 95% of the time, consistent with a margin of error slightly better than ±5%, at the 95% confidence level.

In short, the direct method provides a valid—albeit burdensome—estimate of recall, and eRecall does not.

Summary

Roitblat repeatedly puts words in our mouths to attack positions we do not hold in order to advance his position that one should employ OrcaTec’s software and accept—without any scientific evidence—an unsound estimate of its effectiveness. Ironically, one of the positions that Roitblat falsely attributes to us is that one should not measure anything. Yet, we have spent the better part of the last five years doing quantitative research—measuring—TAR methods.

The Future

We are convinced that sound quantitative evaluation is essential to inform the choice of tools and methods for TAR, to inform the determination of what is reasonable and proportionate, and to drive improvements in the state of the art. We hope that our studies so far—and our approach, as embodied in our TAR Evaluation Toolkit—will inspire others, as we have been inspired, to seek even more effective and more efficient approaches to TAR, and better methods to validate those approaches through scientific inquiry.

Our next steps will be to expand the range of datasets, learning algorithms, and protocols we investigate, as well as to investigate the impact of human factors, stopping criteria, and measures of success. We hope that information retrieval researchers, service providers, and consumers will join us in our quest, by using our toolkit, by allowing us to evaluate their efforts using our toolkit, or by conducting scientific studies of their own.


Follow

Get every new post delivered to your Inbox.

Join 3,525 other followers