Visualizing Data in a Predictive Coding Project

November 9, 2014

data-visual_Round_5This blog will share a new way to visualize data in a predictive coding project. I only include a brief description this week. Next week I will add a full description of this project. Advanced students should be able to predict the full text from the images alone. Study the text and try to figure out the details of what is going on.

Soon all good predictive coding software will include visualizations like this to help searchers to understand the data. The images can be automatically created by computer to accurately visualize exactly how the data is being analyzed and ranked. Experienced searchers can use this kind of visual information to better understand what they should do next to efficiently meet their search and review goals.

For a game try to figure out how the high and low number of relevant documents that you must find in this review project to claim that you have a 95% confidence level of having found all relevant documents, the mythical total recall. This high-low range will be wrong one time out of twenty, that is what the 95% confidence level means, but still, this knowledge is helpful. The correct answer to questions of recall and prevalence is always a high-low range of documents, never just one number, and never a percentage. Also, there are always confidence level caveats. Still, with these limitations in mind, for extra points, state what the spot projection is for prevalence. These illustrations and short descriptions provide all of the information you need to calculate these answers.

The project begins with a collection of documents here visualized by the fuzzy ball of unknown data.


Next the data is processed, deduplicated, deNisted, and non-text and other documents unsuitable for analytics are removed. By good fortune exactly One Million documents remain.


We begin with some multimodal judgmental sampling, and with a random sample of 1,534 documents. Assuming a 95% confidence level, what confidence interval does this create?


Assume that an SME reviewed the 1,534 sample and found that 384 were relevant and 1,150 were irrelevant.


Training Begins

Next we do the first round of machine training. The first round of training is sometimes called the seed set. Now the document ranking according to probable relevance and irrelevance begins. To keep it simple we only show the relevance ranking, and not also the irrelevance metrics display. The top represents 99.9% probable relevance. The bottom the inverse, 00.1% probable relevance. Put another way, the bottom would represent 99.9% probable irrelevance. For simplicity sake we also assume that the analytics is directed towards relevance alone, whereas most projects would also include high-relevance and privilege. In this project the data ball changed to the following distribution. Note the lighter colors represent less density of documents. Red documents represent documents coded or predicted as relevant, and blue as irrelevant. All predictive coding projects are different and the distributions shown here are just one among near countless possibilities.


Next we see the data after the second round of training. Note that the training could with most software be continuous. But I like to control when the training happens in order to better understand the impact of my machine training. The SME human trains the machine, and, in an ideal situation, the machine also trains the SME. The human SME understands how the machine is learning. The SME learns where the machine needs the most help to tune into their conception of relevance. This kind of cross-communication makes it easier for the artificial intelligence to properly boost the human intelligence.


Next we see the data after the third round of training. The machine is learning very quickly. In most projects it takes longer than this to attain this kind of ranking distribution. What does this tell us about the number of documents between rounds of training?


Now we see the data after the fourth round of training. It is an excellent distribution and so we decide to stop and second random sample comes next. That visualization, and a full description of the project, will be provided next week. In the meantime, leave your answers to the questions in the comments below. This is a chance to strut your stuff. If you prefer, send me your answers, and questions, by private email.


Hadoop, Data Lakes, Predictive Analytics and the Ultimate Demise of Information Governance – Part Two

November 2, 2014

recordsThis is the second part of a two-part blog, please read part one first.

AI-Enhanced Big Data Search Will Greatly Simplify Information Governance

Information Governance is, or should be, all about finding the information you need, when you need it, and doing so in a cheap and efficient manner. Information needs are determined by both law and personal preferences, including business operation needs. In order to find information, you must first have it. Not only that, you must keep it until you need it. To do that, you need to preserve the information. If you have already destroyed information, really destroyed it I mean, not just deleted it, then obviously you will not be able to find it. You cannot find what does not exist, as all Unicorn chasers eventually find out.

Too_Many_RecordsThis creates a basic problem for Information Governance because the whole system is based on a notion that the best way to find valuable information is to destroy worthless information. Much of Information Governance is devoted to trying to determine what information is a valuable needle, and what is worthless chaff. This is because everyone knows that the more information you have, the harder it is for you to find the information you need. The idea is that too much information will cut you off. These maxims were true in the pre-AI-Enhanced Search days, but are, IMO, no longer true today, or, at least, will not be true in the next five to ten years, maybe sooner.

In order to meet the basic goal of finding information, Information Governance focuses its efforts on the proper classification of information. Again, the idea was to make it simpler to find information by preserving some of it, the information you might need to access, and destroying the rest. That is where records classification comes in.

The question of what information you need has a time element to it. The time requirements are again based on personal and business operations needs, and on thousand of federal, state and local laws. Information governance thus became a very complicated legal analysis problem. There are literally thousands of laws requiring certain types of information to be preserved for various lengths of time. Of course, you could comply with most of these laws by simply saving everything forever, but, in the past, that was not a realistic solution. There were severe limits on the ability to save information, and the ability to find it. Also, it was presumed that the older information was, the less value it had. Almost all information was thus treated like news.

These ideas were all firmly entrenched before the advent of Big Data and AI-enhanced data mining. In fact, in today’s world there is good reason for Google to save every search, ever done, forever. Some patterns and knowledge only emerge in time and history. New information is sometimes better information, but not necessarily so. In the world of Big Data all information has value, not just the latest.

paper records management warehouseThis records life-cycle ideas all made perfect sense in the world of paper information. It cost a lot of money to save and store paper records. Everyone with a monthly Iron Mountain paper records storage bill knows that. Even after the computer age began, it still cost a fair amount of money to save and store ESI. The computers needed to buy and maintain digital storage used to be very expensive. Finding the ESI you needed quickly on a computer was still very difficult and unreliable. All we had at first was keyword search, and that was very ineffective.

Due to the costs of storage, and the limitations of search, tremendous efforts were made by record managers to try to figure out what information was important, or needed, either from a legal perspective, or a business necessity perspective, and to save that information, and only that information. The idea behind Information Management was to destroy the ESI you did not need or were not required by law to preserve. This destruction saved you money, and, it also made possible the whole point of Information Governance, to find the information you wanted, when you wanted it.

Back in the pre-AI search days, the more information you had, the harder it was to find the information you needed. That still seems like common sense. Useless information was destroyed so that you could find valuable information. In reality, with the new and better algorithms we now have for AI-enhanced search, it is just the reverse. The more information you have, the easier it becomes to find what you want. You now have more information to draw upon.

That is the new reality of Big Data. It is a hard intellectual paradigm to jump, and seems counter-intuitive. It took me a long time to get it. The new ability to save and search everything cheaply and efficiently is what is driving the explosion of Big Data services and products. As the save everything, find anything way of thinking takes over, the classification and deletion aspects of Information Governance will naturally dissipate. The records lifecycle will transform into virtual immortality. There is no reason to classify and delete, if you can save everything and find anything at low cost. The issues simplify; they change to how to save and  search, although new collateral issues of security and privacy grow in importance.

Save and Search v. Classify and Delete

The current clash in basic ideas concerning Big Data and Information Governance is confusing to many business executives. According to Gregory Bufithis who attended a recent event in Washington D.C. on Big Data sponsored by EMC, one senior presenter explained:

The C Suite is bedeviled by IG and regulatory complexity. … 

The solution is not to eliminate Information Governance entirely. The reports of its complete demise, here or elsewhere, are exaggerated. The solution is to simplify IG. To pare it down to save and search. Even this will take some time, like I said, from five to ten years, although there is some chance this transformation of IG will go even faster than that. This move away from complex regulatory classification schemes, to simpler save and search everything, is already being adopted by many in the high-tech world. To quote Greg again from the private EMC event in D.C. in October, 2014:

Why data lakes? Because regulatory complexity and the changes can kill you. And are unpredictable in relationship to information governance. …

So what’s better? Data lakes coupled with archiving. Yes, archiving seems emblematic of “old” IT. But archiving and data lifecycle management (DLM) have evolved from a storage focus, to a focus on business value and data loss prevention. DLM recognizes that as data gets older, its value diminishes, but it never becomes worthless. And nobody is throwing out anything and yes, there are negative impacts (unnecessary storage costs, litigation, regulatory sanctions) if not retained or deleted when it should be.

But … companies want to mine their data for operational and competitive advantage. So data lakes and archiving their data allows for ingesting and retain all information types, structured or unstructured. And that’s better.

Because then all you need is a good search platform or search system … like Hadoop which allows you to sift through the data and extract the chunks that answer the questions at hand. In essence, this is a step up from OLAP (online analytical processing). And you can use “tag sift sort” programs like Data Rush. Or ThingWorx which is an approach that monitors the stream of data arriving in the lake for specific events. Complex event processing (CEP) engines can also sift through data as it enters storage, or later when it’s needed for analysis.

Because it is all about search.

Recent Breakthroughs in Artificial Intelligence
Make Possible Save Everything, Find Anything

AIThe New York Times in an opinion editorial this week discussed recent breakthroughs in Artificial Intelligence and speculated on alternative futures this could create. Our Machine Masters, NT Times Op-Ed, by David Brooks (October 31, 2014). The Times article quoted extensively another article in the current issue of Wired by technology blogger Kevin Kelly: The Three Breakthroughs That Have Finally Unleashed AI on the World. Kelly argues, as do I, that artificial intelligence has now reached a breakthrough level. This artificial intelligence breakthrough, Kevin Kelly argues, and David Brook’s agrees, is driven by three things: cheap parallel computation technologies, big data collection, and better algorithms. The upshot is clear in the opinion of both Wired and the New York Times: “The business plans of the next 10,000 start-ups are easy to forecast: Take X and add A.I. This is a big deal, and now it’s here.

These three new technology advances change everything. The Wired article goes into the technology and financial aspects of the new AI; it is where the big money is going and will be made in the next few decades. If Wired is right, then this means in our world of e-discovery, companies and law firms will succeed if, and only if, they add AI to their products and services. The firms and vendors who add AI to document review, and project management, will grow fast. The non-AI enhanced vendors, non-AI enhanced software, will go out of business. The law firms that do not use AI tools will shrink and die.

David_BrooksThe Times article by David Brooks goes into the sociological and philosophical aspects of the recent breakthroughs in Artificial Intelligence:

Two big implications flow from this. The first is sociological. If knowledge is power, we’re about to see an even greater concentration of power.  … [E]ngineers at a few gigantic companies will have vast-though-hidden power to shape how data are collected and framed, to harvest huge amounts of information, to build the frameworks through which the rest of us make decisions and to steer our choices. If you think this power will be used for entirely benign ends, then you have not read enough history.

The second implication is philosophical. A.I. will redefine what it means to be human. Our identity as humans is shaped by what machines and other animals can’t do. For the last few centuries, reason was seen as the ultimate human faculty. But now machines are better at many of the tasks we associate with thinking — like playing chess, winning at Jeopardy, and doing math. [RCL – and, you might add, better at finding relevant evidence.]

On the other hand, machines cannot beat us at the things we do without conscious thinking: developing tastes and affections, mimicking each other and building emotional attachments, experiencing imaginative breakthroughs, forming moral sentiments. [RCL – and, you might add, better at equitable notions of justice and at legal imagination.]

In this future, there is increasing emphasis on personal and moral faculties: being likable, industrious, trustworthy and affectionate. People are evaluated more on these traits, which supplement machine thinking, and not the rote ones that duplicate it.

In the cold, utilitarian future, on the other hand, people become less idiosyncratic. If the choice architecture behind many decisions is based on big data from vast crowds, everybody follows the prompts and chooses to be like each other. The machine prompts us to consume what is popular, the things that are easy and mentally undemanding.

I’m happy Pandora can help me find what I like. I’m a little nervous if it so pervasively shapes my listening that it ends up determining what I like. [RCL – and, you might add, determining what is relevant, what is fair.]

I think we all want to master these machines, not have them master us.

ralph_wrongAlthough I share the concerns of the NY Times about mastering machines and alternative future scenarios, my analysis of the impact of the new AI is focused and limited to the Law. Lawyers must master the AI-search for evidence processes. We must master and use the better algorithms, the better AI-enhanced software, not visa versa. The software does not, nor should it, run itself. Easy buttons in legal search are a trap for the unwary, a first step down a slippery slope to legal dystopia. Human lawyers must never over-delegate our uniquely human insights and abilities. We must train the machines. We must stay in charge and assert our human insights on law, relevance, equity, fairness and justice, and our human abilities to imagine and create new realities of justice for all. I want lawyers and judges to use AI-enhanced machines, but I never want to be judged by a machine alone, nor have a computer alone as a lawyer.

The three big new advances that are allowing better and better AI are nowhere near to threatening the jobs of human judges or lawyers, although they will likely reduce their numbers, and certainly will change their jobs. We are already seeing these changes in Legal Search and Information Governance. Thanks to cheap parallel computation, we now have Big Data Lakes stored in thousands of inexpensive, cloud computers that are operating together. This is where open-sourced software like Hadoop comes in. They make the big clusters of computers possible. The better algorithms is where better AI-enhanced Software comes in. This makes it possible to use predictive coding effectively and inexpensively to find the information needed to resolve law suits. The days of vast numbers of document reviewer attorneys doing linear review are numbered. Instead, we will see a few SMEs, working with small teams of reviewers, search experts, and software experts.

The role of Information Managers will also change drastically. Because of Big Data, cheap parallel computing, and better algorithms, it is now possible to save everything, forever, at a small cost, and to quickly search and find what you need. The new reality of Save Everything, Find Anything undercuts most of the rationale of Information Governance. It is all about search now.


Ralph_Losey_2013_abaNow that storage costs are negligible, and search far more efficient, the twin motivators of Information Science to classify and destroy are gone, or soon will be. The key remaining tasks of Information Governance are now preservation and search, plus relatively new ones of security and privacy. I recognize that the demise of the importance of destruction of ESI could change if more governments enact laws that require the destruction of ESI, like the EU has done with Facebook posts and the so-called “right to be forgotten law.” But for now, most laws are about saving data for various times, and do not require data be destroyed. Note that the new Delaware law on data destruction still keeps it discretionary on whether to destroy personal data or not. House Bill No. 295 – The Safe Destruction of Documents Containing Personal Identifying Information. It only places legal burdens and liability for failures to properly destroy data. This liability for mistakes in destruction serves to discourage data destruction, not encourage it.

Preservation is not too difficult when you can economically save everything forever, so the challenging task remaining is really just one of search. That is why I say that Information Governance will become a sub-set of search. The save everything forever model will, however, create new legal work for lawyers. The cybersecurity protection and privacy aspects of Big Data Lakes are already creating many new legal challenges and issues. More legal issues are sure to arise with the expansion of AI.

Automation, including this latest Second Machine Age of mental process automation, does not eliminate the need for human labor. It just makes our work more interesting and opens up more time for leisure. Automation has always created new jobs as fast as it has eliminated old ones. The challenge for existing workers like ourselves is to learn the new skills necessary to do the new jobs. For us e-discovery lawyers and techs, this means, among other things, acquiring new skills to use AI-enhanced tools. One such skill, the ability for HCIR, human computer information retrieval, is mentioned in most of my articles on predictive coding. It involves new skill sets in active machine learning to train a computer to find the evidence you want from large collections of data sets, typically emails. When I was a law student in the late 1970s, I could never have dreamed that this would be part of my job as a lawyer in 2014.

The new jobs do not rely on physical or mental drudgery and repetition. Instead, they put a premium on what makes up distinctly human, our deep knowledge, understanding, wisdom, and intuition; our empathy, caring, love and compassion; our morality, honesty, and trustworthiness; our sense of justice and fairness; our ability to change and adapt quickly to new conditions; our likability, good will, and friendliness; our imagination, art, wisdom, and creativity. Yes, even our individual eccentricities, and our all important sense of humor. No matter how far we progress, let us never lose that! Please be governed accordingly.

e-Discovery Industry Reaction to Microsoft’s Offer to Purchase Equivio for $200 Million – Part Two

October 19, 2014

microsoft_acquiresThis is part two of an article providing an e-discovery industry insiders view of the possible purchase of Equivio by Microsoft. Please read Part One first. So far the acquisition by Microsoft is still just a rumor, but do not be surprised if it is officially announced soon.

Another e-discovery insider has agreed to go public with his comments, and three more anonymous submissions were received. Let’s begin with these quotes, and then I will move onto some analysis and opinions on this deal and the likely impact on our industry.

More Industry Insider Comments


Jon Kerry-Tyerman (VP, Business Development, Everlaw): “If you think about this potential acquisition in the context of the EDRM, it makes a lot of sense. The technological issues on the left-hand side—from Information Governance through Preservation and Collection—are primarily search-related, rather than discovery-related.  And the technology behind search is largely a problem that’s been solved. That’s why we see these tasks being commoditized by the providers of the systems on which these data reside, entrenched players like Microsoft and Google. Microsoft has already shown a willingness to wade deeper here (see, e.g., Matter Center for Office 365), so the acquisition of Equivio’s expertise to improve document search and organization within the enterprise is a logical extension of that strategy.

I don’t think, however, that this heralds an expansion by Microsoft into the wider “ediscovery” space. The tasks on the right-hand side of the EDRM—particularly Review through Presentation—depend on expert legal judgment. While technology cannot supplant that judgment, it can be used to augment it. Doing so effectively, however, requires a nuanced understanding of the unique legal and technological problems underlying these tasks, and the resulting solutions are not easily applicable to other domains. For a big fish like Microsoft, that’s simply too small a pond in which to swim. It happens to be the perfect environment for a technology startup, however, which is why we’re focusing exclusively on applying cutting-edge computer science to the right-hand side of the EDRM—including our proprietary (read: non-Equivio!) predictive coding system.”

anonymousAnonymous One (a tech commentator not in e-discovery world provides an interesting outsider view): “I read the commentary and found it to be fairly eDiscovery introspective.  What I think is:

  1. I don’t know the Equivio markets as well as I should. I thought Equivio was/is a classification engine that did a wonderful job of deduplication of email threads. They played in the eDiscovery markets and we don’t focus on these markets except for their relevance to information governance.
  2. Equivio lacked a coherent strategy to integrate to the Microsoft stack, at the level of managed metadata, content types, and site provisioning, which doomed them to bit player status unless someone acquired them or they committed to tight integration with the hybrid SharePoint/Office 365/Exchange/OneDrive/Yammer/Delve/File Share stack for unified content governance. Now someone has. Hats off to Warwick & Co. for $200MM for this.
  3. My expectation is that Equivio will be added into Office 365 and Delve to crawl through everything you own and classify it, launching whatever processes you want. This is not good news for Content Analyst, dataglobal, Nuix, HP Autonomy, or Google, except that Google and HP are able to stand on their own. It is also not good news, but less bad news for Concept Searching, Smart Logic and BA Insight, in that they leverage SharePoint and Office 365 Search and extend it with integration points and connectors to other systems.
  4. Microsoft is launching Matter Center at LegalTech in NYC in February after announcing it at ILTA. This is the first of the vertical solutions that begin the long journey of companies to adopt either the Microsoft or Google cloud solution stacks and abandon the isolated silos of information like Box, Dropbox, etc., for the corporate side of information management.”


Anonymous Two: “It’s an interesting move for Microsoft. $200M is a little high for tools in our industry, but is peanuts for them. They make dozens of these types of moves and spend billions each year acquiring various companies and technologies. I agree with Craig Ball regarding how many times have we seen formidable competitors go the way of the Dodo after they were purchased by a bigger company. I highly doubt they are planning to jump into our industry to lock horns with all of us. It is more likely that they may be developing some sort of Information Governance & Analysis offering for businesses, which could have some downstream effects on eDiscovery.”


Anonymous Three: “The acquisition of Equivio by Microsoft and the price paid are not a complete surprise. I agree with others who do not see this as a sign of Microsoft entering the ediscovery business. If Microsoft wanted  to do that it could acquire any of the big ediscovery players out there. Rather, the Equivo acquisition allows Microsoft to offer a service that other big data companies cannot. Putting aside HP’s acquisition of Autonomy, I think Microsoft’s acquisition of Equivo is only the first of what will be a series of technology acquisitions by big data companies. These companies, that handle terabytes upon terabytes of data for major corporations around the world, can one day provide ediscovery as an additional service offering. That day isn’t today, but it is coming.”

What Microsoft Will Do With Equivio

DiogenesThe consensus view is that after the purchase Microsoft will essentially disband Equivio and absorb its technology, its software designs, and some of its experts. Then, as Craig Ball predicts, they will wander the halls of Redmond like the great cynic Diogenes. No one seems to think that Microsoft will continue Equivio’s business. For that reason it would make no sense for Microsoft to continue to license the Equivio search technologies to e-discovery companies. That in turn means a large part of the e-discovery industry that now depends on Equivio search components, and licenses with Equivio, will soon be out of luck. Zoom will go boom! More on that later.

If  Microsoft did not buy Equivio to continue its business, why did it want its technology? As the scientists I talked to all told me, Microsoft already has plenty of artificial intelligence based text search capabilities, software, and patents. But maybe they are not designed for searching through large disorganized corporate datasets, such as email? Maybe their software in this area is not nearly as good as Equivio’s. As smart and talented as my scientist friends seem to think Microsoft is, the company seems to have a black hole of incompetence when its comes to email search and other aspects of information management.

The consensus view is that Microsoft wants Equivio to grab its technology and patents (at least one commentator also thought they were also after Equivio’s customers). The Microsoft plan is probably to incorporate its software code into various existing Microsoft products and new products under development. Almost no one expects those new products to be e-discovery specific. They might, however, help provide a discovery search overlay to existing software. Outlook, for instance, has pathetic search capacities that frustrate millions daily. Maybe they will add better e-discovery aspects to that. I personally expect (hope) they will do that.

Information Governance Is Now King

emperor's new clothes woodcutI also agree with the consensus view in our industry, a view that is now preoccupied with Information Governance, that Microsoft’s new products using Equivio technology will be information governance products. I expect Microsoft to once again follow IBM and focus on the left side of EDRM. I expect Microsoft to come out with new Governance type products and software module add-ons. I do not think that Microsoft will go into litigation support specific products, such as document review software, nor litigation search oriented products. Like IBM, they think it is still too small a market and too specialized a market.

Bottom line, Microsoft is not interested in entering the e-discovery specific market at this time, any more than IBM is. Instead, like most (but not all, especially Google) of the smart creatives of the technology world, Microsoft has bought into the belief that information is something that can be governed, can be managed. They think that Information Governance is like paper records management, just with more zeros after the number of records involved. The file-everything librarian mentality lives on, or tries to.

The Inherent Impossibility, in the Long Run, of Information Governance

Most of the e-discovery world now believes that Information Governance is not only possible, but it is the savior to the information deluge that floods us all. I disagree, especially in the long run. I appear to be a lone dissenting voice on this in e-discovery. I think the establishment majority in our industry is deluding themselves into thinking that information is like paper, only there is more of it. They delude themselves into thinking that Information is capable of being governed, just like so many little paper soldiers in an army. I say the Emperor has no clothes. That information cannot be governed.

paper doll cutouts

Electronic Information is a totally new kind of force, something Mankind has never seen before. Digital Information is a Genie out of the bottle. It cannot be captured. It cannot be managed. It certainly cannot be governed. It cannot even be killed. Forget about trying to put it back in the bottle. It is breeding faster than even Star Trek’s Tribbles could imagine. Like Baron and Paul discussed in their important 2007 law review, ESI is like a new Universe, and we are living just moments after the Big Bang. George L. Paul and Jason R. Baron, Information Inflation: Can the Legal System Adapt? 13 RICH. J.L. & TECH. 10 (2007).



Ludwig-WittgensteinWhat few outside of Google, Baron, and Paul seem to grasp is that Information has a life of its own. Id. at FN 30 (quoting Ludwig Wittgenstein (a 20th Century Austrian philosopher whom I was forced to study while in college in Vienna): “[T]o imagine a language is to imagine a form of life.”) Electronic information is a new and unique life form that defies all attempts of limitation, much less governance. As James Gleick observed in his book on information science, everything is a form of information. The Universe itself is a giant computer and we are all self-evolving algorithms. Gleick, The Information: a history, a theory, a flood.

Essentially information is free, and wants to be free. It does not want to be governed, or charged for. Information is more useful when free and when it is not subject to transitory restraints.


Regardless of the economic aspects, and whether information really wants to be free or not, as a practical matter Information cannot be governed, even if some of it can be commoditized. Information is moving and growing far too fast for governance.

Digitized information is like a nuclear reaction that has passed the point of no return. The chain reaction has been triggered. This is what exponential growth really means. In time such fission vision will be obvious. Even people without Google glasses will be able to see it.

Nuclear chain reaction

In the meantime we have a new breed of information governance experts running around who serve like heroic bomb squads. Some know that it is just a noble quest, doomed to failure. Most do not. They helicopter into corporate worlds attempting to defuse ticking information bombs. They build walls around it. They confidently set policies and promulgate rules. They talk sternly about enforcement of rules. They automate filing. They automate deletion. Some are even starting to make robot file clerks.

Information governance experts, just like the records managers before them, are all working diligently to try to solve today’s problems of information management. But, all the while, ever new problems encroach upon their walls. They cannot keep up with this growth, the new forms of information. The next generation of exponential growth builds faster than anyone can possibly govern. Do they not know that the bomb has already exploded? The tipping point has already past?

Information governance policies that are being created today are like sand castles built at low tide. Can you hear the next wave of data generated by the Internet of Things? It will surely wash away all of today’s efforts. There will always be more data, more unexpected new forms of information. Governance of information is a dream, a Don Quixote quest.

Information can not be governed. It can only be searched.

search_globalIn my view we should focus on search technologies, and give up on governance. Or at least realize it is a mere stop-gap measure. In the world I see, search is king, not governance. Do not waste your valuable time and effort trying to file information. Just search for it, when and if you need it. You will not need most of it anyway.

I do not really think Microsoft has the fission vision, but I could be wrong. They may well see the world like I do, and like Google does, and realize that it is all search now. Microsoft may already understand that information governance is just a subset of search, not visa versa. Maybe Microsoft is already focused on creating great new search software that will help us transition from governance to search. Maybe they hope to remain relevant in the future and to compete with Google. No one knows for sure the full thinking behind Microsoft’s decision to buy Equivio.

The majority of experts are probably right, Microsoft probably does have information governance software in mind when buying Equivio. Microsoft probable still hangs onto the governance world view, and does not see it my way, or Google’s way, that it is all about search. Still, by buying good search code from Equivio, Microsoft cannot go wrong. Eventually, after the governance approach fails, which I predict will happen in ten years, or less, and Microsoft and the governance experts finally see the world like Google and me, it will help to have Equivio’s code as a foundation.

What Happens If Zoom Goes Boom?

In the short-term what companies may be adversely affected by the exit of Equivio from the e-discovery market?  I had first thought that K-Cura would be adversely impacted, but apparently that’s wrong. You can see how I would be confused because when you look at Equivio’s installed base web page, Equivio features K-Cura and its Relativity review platform. Equivio even includes a page on its website that promotes the Equivio Zoom tab on Relativity’s software. Nevertheless, K-Cura insists that it does not have anything built on Equivio’s technology. K-Cura states that its analytics engine is OEM from another company, Content Analyst. For that reason, it says its products will not be affected that much if Zoom is no longer a plug-in.

K-Cura says that its relationship with Equivio is simply that of a Relativity developer partner. It allows Equivio to develop an integration that allows users of Relativity to access Zoom from within the Relativity platform. Those users still need to license Zoom separately and get the plug-in from Equivio. Relativity itself has a Content Analyst’s engine fully baked-in for the same kind of text analytics, predictive coding, etc.  K-Cura states that functionality will still all be there, no matter what happens with Equivio. So I stand corrected on my original comments to the contrary.

So what does happen if Zoom goes boom? Companies that depend on Equivio may be in trouble, or may simply move to Content Analysts or someone else. Are they as good as Equivio? I do not know. But I do know there are huge differences in analytics quality and how well one company’s predictive coding features work as compared to another. That is exactly why Equivio existed, to license technologies to fill the gap. Apparently Content Analyst and others do the same. They do the research and code development on search so that most other vendors in the industry do not have to. The trade-off is dependency and the chance they may close shop or be bought out.

Only a few vendors have taken the time, and very considerable expense, to develop their own active machine learning software features, instead of licensing it from Equivio or Content Analysts. These vendors will now reap the rewards of having the rugs pulled out from under some of their competitors. Eventually even lawyers will realize that search quality does matter, that all predictive coding software programs are not alike.

There is a long list of other key users of Equivio products, some of whom may be concerned about losing Equivio’s prodcuts. They include, according to the list on Equivio’s own website:

  • Concordance by Lexis Nexis
  • DT Search
  • EDT
  • Xera iConnect
  • iPro
  • Law PreDiscovery by Lexis Nexis
  • Thomson Reuters

In addition, Equivio’s installed base web page lists the following companies and law firms as users of their technology. It is a very long list, including many prominent vendors in the space, and many small frys that I have never heard of. They may all be somewhat concerned about the Microsoft move, to one degree or another, according to how dependent they are on Equivio software or software components.

  • Altep
  • BDO Consulting
  • Bowman & Brooke
  • CACI
  • Catalyst
  • CDCI research
  • Commonwealth Legal
  • Crown Castle
  • D4
  • Deloitte
  • Dinsmore
  • Discover Ready
  • Discovia
  • doeLegal
  • DTI
  • e-Stet
  • e-Law
  • Envision Discovery
  • Epiq Systems
  • Ernst & Young
  • eTera Consulting
  • Evidence Exchange
  • Foley & Lardner
  • FTI Consulting
  • Gibson Dunn
  • Guidance Software
  • H&A
  • Huron
  • ILS Innovative Litigation Services
  • Inventus
  • IRIS
  • KPMG
  • USIS Labat
  • Law In Order
  • LDiscovery
  • Lightspeed
  • Lighthouse eDiscovery
  • Logic Force Consulting
  • Millnet
  • Morgan Lewis
  • Navigant
  • Night Owl Discovery
  • Nulegal
  • Nvidia
  • ProSearch Strategies
  • PWC
  • Qualcomm
  • Reed Smith
  • Renew Data
  • Responsive Data Solutions
  • Ricoh
  • RVM
  • Shepherd Data Services
  • Squire Sanders
  • Stroock
  • TechLaw Solutions
  • Winston & Strawn

This is Equivio’s list, and it may not be current, nor even accurate (some of the links were broken), but it is what is shown on Equivo’s website as of October 14, 2014. Do not blame me if Equivio has you on the list, and you should not be, but feel free to leave a comment below to set the record straight. Hopefully, many of you have already moved on, and no longer use Equivio anymore anyway, like K-Cura. I happen to know that is true for a few of the other companies on that list. If not, if you still rely on Equivio, well, maybe Microsoft will still do business with you when it is time to renew, but most think that is very unlikely.


Ralph_bemuzedIt is hubris to think that a force as mysterious and exponential as Information can be governed. Yet it appears that is why Microsoft wants to buy Equivio. Like most of establishment IT, including the vast majority of pundits in our own e-discovery world, Microsoft thinks that Information Governance is the next big thing. They think that predictive coding was just a passing fad that is now over. If these assumptions are correct, then we can expect to see fragments of Equivio’s code appear in Microsoft’s future software as part of general information governance functions. We will not see Microsoft come out with predictive coding software for e-discovery.

Once again, Microsoft is missing the big picture here. Like most IT experts today outside of Google, they do not understand that Search is king, and governance is just a jester. The last big thing, Search, especially AI enhanced active machine learning, iw – predictive coding, is still the next big thing. Information governance is just a reactive, Don Quixote thing. Not big at all, and certainly not long-lasting. If anything, it is the dying gasp of last century’s records managers and librarians. Nice people all, I’m sure, but then so was John Henry.

Microsoft’s absorption of Equivo is a setback for search, for legal e-discovery. But at the same time it is a boon for the few e-discovery vendors who chose not to rely on Equivio, and chose instead to build their own search. It is also a boon for Google, as, once again, Microsoft shows that it still does not get search. You will not see Google fall for a governance dream.

Search is and will remain the dominant problem of our age for generations. Information cannot be governed. It cannot be catalogued. It can only be searched. Everyone needs to get over the whole archaic notion of governance. King George died long ago.

Google has it right. We should focus our AI development on search, not governance. Spend your time learning to search, forget about filing. It is a hopeless waste of time. It is just like the little Dutch boy putting his finger in the dyke. Learn to swim instead. Better yet, build a search boat like Noah and leave the governor behind.

e-Discovery Industry Reaction to Microsoft’s Offer to Purchase Equivio for $200 Million – Part One

October 12, 2014

microsoft_acquiresOn Oct. 7, 2014, the Wall Street Journal reported that Microsoft had signed a letter of intent to buy what they called an Israel-based text analysis startup company named Equivio. The mainstream business press has virtually no understanding of the e-discovery industry, nor anything having to do with litigation support. They also seem to have no real grasp of what kind of software Equivio and others like it in the industry have created. They have probably never even heard of predictive coding! The business press for all of these reasons, and more, have no idea why Microsoft would pay $200 Million to buy Equivio. But, as this blog will show, we in the e-discovery community have plenty of ideas about that, and plenty to say about the whole deal.

Unlike the general business press, including the prestigious Wall Street Journal, everyone in the e-discovery industry knows, or at least know of, Amir Milo, Yiftach Ravid and Warwick Sharp. We all know their company, Equivio. Even though the Wall Street Journal calls Equivio a start up company, we all know that is not true. Equivio’s IPO was in 1998. Milo, Ravid and Sharp have been part of the e-discovery world from the very beginning. I must admit, however, that no one seems to know, not even Milo himself, why Equivio’s website always shows them hiding behind funny paper documents. But at least now we know why they are smiling.


The WSJ reported that the deal could still fall through, and neither side would comment. Of course, as all lawyers know, deals can always fall though, that is not news. But, in my experience, once the letter of intent stage is reached, and it is leaked, it is pretty much a done deal, barring only unforeseen due diligence problems. So, assuming the deal does go through, what does this mean to the e-discovery industry?

I asked a few of the leaders in the e-discovery world their reaction to the Microsoft Equivio deal. Most of them responded, some on the record, some off, and some both on and off. Here is my report, on what every one seems to agree is very big news indeed, at least for our industry.

Business Press View of the Microsoft Equivio Deal

Before I share the industry insights, it is interesting to see how the general business world views the deal. First, as the WSJ article that broke the news exemplifies, they do not even seem to know that a e-discovery industry exists, nor that Equivio is part of it. Instead, the WSJ just describes Equivio as a startup company that has created:

text analysis software that can group together relevant texts from large amounts of documents—including emails and other organizational social and collaboration networks—using machine learning algorithms. The algorithms generalize samples of texts marked as relevant to the issue at hand to apply the sorting logic to groups of texts, such as legal documentation.

The article states that the technology is already in use by organizations that provide litigation support services to law firms and corporate legal departments. Well, at least the business world seems to knows that there is some sort of litigation support industry. The job our industry supposedly performs is also misunderstood. It is over simplified to the point of absurdity. The WSJ level of comprehension in this area is exceedingly low. They think the job of litigation support is to, as they put it, try to extract relevant data, such as legal contracts, from massive amounts of documents.

amir_marloSo apparently the tens of thousands of us in the e-discovery world spend our professional lives trying to find legal contracts. My, what idiots we must all be! And by implication, what idiots Microsoft must be to spend $200 Million for a company that has developed software with machine learning to find contracts. Wrong! There is much more to predictive coding than meets the eye of the average business journalist, most of whom have never even heard of the term, much less of Mr. Milo. Whatever the final price may be, the Two Hundred Million number sounds about right to me. No one else I talked to seemed shocked by the price either, which was certainly not true about Hewlett Packard’s ill-fated purchase of Autonomy for $10 Billion. Although a few friends I talked to did say that the next time they have dinner with Milo they are going to let him pick up the check.

 Industry Insider On-The-Record Comments

I will start off by reporting the comments where I have been given permission to quote with attribution. Then I will share a few comments where I was provided permission to quote, but not provide attribution. Some insiders also provided interesting background type information and speculation, which I do not have permission to quote or cite in any way. These comments inform my own opinions, which, you can rest assured, will also appear in this blog, but in Part Two, along with any straggler comments I may receive.

But first the attributable quotes, with thanks to the many who quickly responded to my vague questions, and agreed to go on the record about this very interesting, yet, as of today, still only rumored deal. As you can see my favorite industry insiders have a lot to say about this deal. Some of it is kind of corporate approved general writing, but there are also some controversial and strong opinions in here. Plus you will find some deep thoughts about our industry in general, not just this one deal.

Jason R. Baron (Of Counsel, Drinker, Biddle & Reath LLPwho broke the news to me of this story and so gets the lead): “I consider the deal to be a good thing for the legal tech sector.  As we see bigger players with more market power recognizing the value that firms like Equivio contribute, we can hope that the bigger firms will leverage their greater influence to accelerate adoption of good IG practices.”



Craig, Ralph & Jason at some sort of e-disco event

Craig Ball (ESI Special Master and Attorney,  Computer Forensic Examiner, Author and Educator): “I see two options for an acquired Equivio: Either it empowers Equivio to grow in the legal marketplace, or (and the smart money’s here), it spells the disappearance of Equivio from the legal marketplace. If Equivio’s technology isn’t destined to wander the halls in Redmond like Diogenes carrying a Zune, it will be dedicated to internal use or baked into offerings not geared to e-discovery.

What I do not think the acquisition signals is a desire by Microsoft to compete in the fledgling e-discovery marketplace. Microsoft isn’t buying Equivio for its nascent presence in e-discovery. It wants Equivio’s technology–maybe for 365, maybe for Bing, maybe for a product yet to be named. Sometimes, big companies buy technology to stick it in a drawer. Look at how many great products went to Lexis-Nexis to die. One thing is fairly certain, you can bid Equivio adieu from litigation support.  …

As to the price paid for Equivio, it’s a windfall for Equivio; but for Microsoft, the price is as impactful as it would be for you or I to purchase a good steak.”
John_GJohn Grancarich (Vice President, Product Management, Kroll Ontrack): “Microsoft is continuing to expand its data analysis offerings to provide solutions for larger data management problems. The ediscovery marketplace has some compelling and cutting-edge technologies that can move Microsoft closer toward achieving those broader goals. A few years back, Microsoft representatives started attending EDRM conferences – and other such think-tank meetings – which certainly helped them attain a deeper level of understanding about the robust capabilities across the entire technology provider landscape. When you think about combining the power of Microsoft with the software and service capabilities of key players in the ediscovery industry, the future ahead is exciting.”
JohnTredennickJohn Tredennick (Founder and Chief Executive Officer of Catalyst): “I was surprised at the news but extend my congratulations to Amir and the Equivio team for their successful outcome.
I have no inside information on this deal but would suspect that Equivio’s move into the Information Governance space might have been as attractive to Microsoft as their e-discovery background. IG systems can involve billions of documents (rather than the millions in e-discovery) so the problems are of a different magnitude. Systems that can make sense of such volumes will be at a premium in this new era of information management.”

ken_withersKenneth J. Withers (Deputy Executive Director, The Sedona Conference): “First, all we know is that there is a letter of intent signed between Microsoft and Equivio, in which Microsoft states its intention to acquire Equivio for $200 million. We don’t have any details on exactly why MSFT is doing this, or what it plans to do with Equivio’s products, current client base, IP, or staff. Based on a quick look at Equivio’s web site, it doesn’t’ look like they have any high-ranking female executives, so I think we can rule out “executive and staff diversity” as a goal in this acquisition. Beyond that, it may be a mistake for readers of eDiscovery team to think that entry into the legal marketplace is a goal, either. The eDiscovery market is – or soon will be – dwarfed by the larger Information Governance (IG) market, and that is an area in which MSFT really needs to step up its game, especially in relation to SharePoint. For several years, information professionals have viewed SharePoint as a proverbial poisoned apple, symbolizing the potential for both great knowledge and great sin. You can’t keep people from biting at the apple, you can only manage the consequences of immediate expulsion from IG Eden into a wilderness of terabytes of fractured data. I’m sure it has not been lost on MSFT executives that there are at least a dozen third-party IG solution providers purporting to tame data generated by Microsoft Office applications and stored in SharePoint. There are many facets of IG that are amenable to smart automated solutions. As volumes of data continue to grow exponentially, an advanced data analytics component to an overall IG suite of applications is absolutely essential. Readers of this blog may think first in terms of eDiscovery, but for many companies providing data analytics solutions, eDiscovery has become the market testing ground for the much more lucrative IG market. If you can make it work for the litigators, there is hope for the rest of the world. And not only would you have a good IG tool, but you would also still have a solid eDiscovery tool that could be built into a client’s (or law firm’s) Microsoft deployment. So this acquisition might also be another step in the mainstreaming of eDiscovery – taking some of the most costly and least lawyerly tasks out of the hands of the big law firms and third-party legal service providers, and enabling small businesses, small law firm, and even individuals to cost-effectively engage in eDiscovery. I mentioned, I don’t think that is MSFT’s primary motivation. But if MSFT offers a serious eDiscovery tool to the masses, I think that many of the legal service providers (and large law firms with eDiscovery search and processing divisions) will need to reexamine their business models.

Largely through its eDiscovery offerings, Equivio has built a solid reputation and a respectable client base, so I am not surprised that MSFT would look to it as a potential partner (or meal, depending on your point of view) to add advanced data analytics to an emerging suite of applications for the IG market. And for MSFT, the $200 million offer is equivalent to a rounding error in their overseas tax liability, so it’s a bargain, too. And I don’t equate this with HP’s acquisition of Autonomy at all.

All this is speculation, of course. I’m not qualified to predict what MSFT will do, but by acquiring Equivio, MSFT is positioning itself to compete with IBM (which has Watson and lots of other R&D in the works) and Google (which is, after all, Google) in large-scale data management through analytics.”


Bill Hamilton (Partner, Quarles & Brady): “I see it as a seismic shift.”


J. William “Bill” Speros (Attorney Consulting in Litigation Management): “Practical. Rational. Responsible. Therefore, not natural for a big company.”


Bruce_blankBruce I. Blank (Director, Litigation Services & Support, Foley & Lardner LLP): “With an organization like Microsoft and the endless list of research they are conducting you can only guess at why they would purchase Equivo and that is exactly what I am going to do, guess. Microsoft has been inching closer and closer to the discovery world for several years now.  With the role out of Exchange 2013 and the in-place discovery search tool being integrated into the discovery management system it is clear there is a defined focus on collection for hold and discovery purposes. There was an  announcement today that with Office 365 and OneDrive Microsoft is going to separate attachments from emails. The first reactions is it sends a chill down ones back, at least those that had to deal with linked attachments to emails in the past but maybe Microsoft has solved that problem with key pointers that will preserve family relations. Where does Equivo fit in? Microsoft is using Keyword Query Language, (KQL) which they claim will easily construct powerful search queries to search content indexes for both on-premises and on-line however this is just keyword searches, which is very limiting in many ways. But what if you incorporate Equivo’s analytics in conjunction with the KQL. Keyword searching now with predictive coding analytics raises the bar of credibility. Maybe this, now common workflow, is necessary to make this palatable to the corporate market.

I am not really sure at the end of the day what gets accomplished using Microsoft e-discovery tools, particularly if we are dealing with a large organization with many moving parts. Microsoft clearly says we can search it if it is in our Ecco system but if not, you need another way to search for discovery. In other words, if you are using OneDrive, SharePoint, Exchange, or Office products, for example, then they will be able to search it. If you are using any Apple or Google tools (or many others) on your computer they might not get searched, just tagged as un-searched or un-readable. Thus potentially throwing any search term reports off and possibly even requiring a third party vendor to finish the job to put the pieces back together again. Relaying on an inexperienced attorney or IT member crafting the search strategy coupled with the limits of Microsoft e-discovery tools could be a recipe for disaster.

Well as I said, all of this is only a guess. There is much to learn yet about what they are up to but on the face, I wonder what the discovery attorneys I work with would have to say about clients IT getting even more involved in the discovery process.  The discovery industry has grown in sophistication not only by the technology tools used but by the flesh the attorneys have had to pay for the inconsistencies and inaccuracies of tools given to our IT friends.  There are some good things that Microsoft is doing in discovery but I would certainly recommend solid experienced guidance with your discovery projects. I started this with just a guess but if my guess is close then one must ask not only who will be doing discovery in a few years but what “won’t” be discovered?”


Melinda_LevittMelinda Levitt (Partner, Foley & Lardner LLP): “Microsoft is certainly a giant of the electronic information world and in many, many ways it has completely changed the way that people communicate in the modern world. But, as an attorney who practices in the ediscovery “space” – I have seen no indication that Microsoft is a player, or has any real experience or expertise in this field. In matters in which we have been involved, we have had clients who bought tools that they were told would make preservation and collection easier to do and it could all be done in-house – and we have seen the significant flaws with those systems because they were not designed by attorneys – or the specialist litigation technician – who really understood, based on hands on experience over many years, the nuances of ediscovery and what is needed . . . what is responsive, or maybe responsive . . .  what may be confidential and worthy of protection, or what is not. Etc. Perhaps Microsoft has such people and is working with them to take us to a next generation of ediscovery/big data management – including most particularly managing enormous caches of emails and making them searchable with advanced analytics . . . . but without some indication that they understand the intricacies of what is involved – that special place where sophisticated legal skill, very specialized technical understanding, and the “art” of practicing law meet – then I remain skeptical and worried.”


GregoryBufithisGregory P. Bufithis (Attorney & Managing Director, eTERA Consulting Europe): “I am intrigued by the Microsoft/Equivio tie up. My initial reaction was the deal makes sense given Nadella’s strategy for the “New Microsoft”, what he calls the “data analysis” Microsoft. And I agree with Ralph: it seems to be an admission by Microsoft that they do not have any real AI capacity as concerns document search. But I thought Microsoft would have gone after somebody else. The acquisition of Equivio is no surprise. From what I understand, they have had a book out on the street for the past 12+ months. It will be interesting if anyone else makes a play now.But as Ralph says, one’s first thought: is Microsoft really serious about playing in “our” legal sandbox? Just a few points:One point to immediately dismiss: the purchase price of $200 million being bandied about in the press. In the Bloomberg review, the analyst thought $200 million was far too high. I have been following the business press in Israel and the general take is the price will be “far lower”. But we have no way of knowing since we have not seen any press releases, no signed letter of intent being waved about. The 8-K disclosure requirements do not mandate the disclosure of letters of intent and other non-binding agreements so I doubt we’ll see the LOI via an S.E.C. filing by Microsoft. But if it is $200 million … way to go Amir!!

nadellaNadella [Microsoft CEO, Satya Nadella, shown right] has been trying hard to redefine his company for the post-Gates/Ballmer era. If you took the time to read that 3,100 word “positioning memo” he sent out over the summer to every Microsoft employee (and to the world in general) it’s all in there. But as a media guy who spends a lot of time in the digital biosphere, I have a short note to Mr Nadella: your memo was waaaaay too long with too many messages. Your troops either stopped reading it or just forgot it as soon as they scanned it. A video would have worked better at that length [point of reference: Tim Cook’s excellent video to his troops after taking command of Apple]. And Satya, I mean really: fake cheerleading and empty words like “synthesize” and “potential” and “revolution” will bring out all of the cynics. Like me.

Yet I found it a fascinating document for many reasons. Talk is cheap and Nadella has to produce and match his talk of “potential” and “synthesize” and “revolution” with “real substance”. You cannot talk your way into continued technology leadership. A press release is fine and we all love cultural revolutions (and, yes, I admit it: he does looks perfect in a shirt and jacket and jeans; Tim Cook would love to look that good) but he has a problem: he begins at the altar of innovation and for Microsoft that means a tradition of pretty much stealing technology, so Microsoft’s “tradition of innovation” is a bit hard to even detect, much less revive. …

Nadella has talked endlessly about Microsoft keeping important to personal and organizational productivity by emphasizing, it seems, the coordination of information in a world where users have multiple devices and there are a growing number of devices independent from any user. Oh, you know. That damn, that infernal Internet of Things (IoT). But an obvious problem: for the first time in a long time Microsoft is not a leader in any of this. Microsoft is just one of many companies in analytics and business intelligence. Yes, he sleeps a little better knowing Samsung must continue to pay him $1 billion+ a year in patent licenses because Samsung phone technology is dependent on Microsoft tech they patented but never did much with. At least on that score the old Microsoft … Bill Gates’ all embracing essence of Microsoft …. Was a bit innovative, establishing de facto standards. But while Windows is the top OS, it’s pretty much ignored in mobile and IoT.

Yes, Microsoft makes a boatload of money. But in Silicon Valley there are two sayings that everyone regards as truth. One is that profits follow relevance. The other is that there’s a difference between strategic position and financial position. It’s easy to be in denial and think the financials reflect the current reality. They do not. Around three-quarters of Microsoft’s profits come from the two fabulously successful products on which the company was built: the Windows operating system, which essentially makes personal computers run, and Office, the suite of applications that includes Word, Excel, and PowerPoint. Financially speaking, Microsoft is still extraordinarily powerful. In the last 12 months the company reported sales of $86+ billion and earnings of $22+ billion. It has $85+ billion cash on its balance sheet. But the company is facing a confluence of threats that is all the more staggering given Microsoft’s sheer size. Competitors such as Google and Apple have upended Microsoft’s business model, making it unclear where Windows will fit in the world, and even challenging Office.  …

Yes, Microsoft has a boatload of money and thousands of good employees but its management culture works against true innovation. Nadella figures he’ll “buy” that culture (Minecraft and Equivio being examples) to right the ship. So how does Equivio figure in this? A few points:

  1. Nadella is an engineer with advanced degrees in computer science. So he knows that clean logical code simply does not exist in some abstract conceptual space. It “plays” in a complexly shaped, intricately interacting digital information universe. We all know that having been in the e-discovery trenches. And Microsoft and Equivio have been in the trenches together. Equivio has been working with many Microsoft technologies … including Windows XP, SQL Server and SharePoint Server … since 2006, if not earlier. One thing the Microsoft reps noted at LegalTech this past year was that the integration of Equivio’s technology added an important layer of structure to SharePoint data repositories. One chap said “we have seen that it clearly has expedited a corporation’s response to e-discovery requests, internal investigations and regulatory tasks.” And we know from market chats that Microsoft and Equivio have talked about integrating Equivio technology into “Track Changes” and “Compare Documents” and other functionalities within Word. So we must assume that Nadella pretty much knows what he is buying. He now has the chance to weave together the “potential” and the “synthesis” and the “revolution” and the “real substance” he has been talking about. Granted, at a low level but a key one in keeping with the e-discovery model they have been building.
  2. Microsoft has been pushing machine learning in more of its products. I saw Nadella speak earlier this year and his whole focus was “data analysis” and “machine learning.” We know that machine learning traditionally requires complex software, high-end computers, and seasoned data scientists who understand it all. Nadella’s pitch has been that for many startups and even large enterprises “it’s simply too hard and expensive.” So he has moved to bring machine learning/predictive analytics to a more accessible level to a much broader audience. Equivio can help that.
  3. But the point (always in these acquisitions): can Microsoft execute? Can they integrate Equivio? The problem with Microsoft has been they lack consistency and perseverance. They always seem to be looking for quick success. Each Microsoft leadership comes in and implements his own stuff. And I will not even get into the infamous Gates/Ballmer internal wars after Ballmer took over.  … You only need to look at a company like Symantec to realize one can gobble up big and small vendors alike but you really need to integrate, market, and sell them. I have studied the M&A market. Rarely is an acquisition failure a function of buying bad companies/bad technologies. Almost every time it is a direct function of the inability to execute on a broad vision. I will use … yet again … my essay on the H-P acquisition of Autonomy back in 2011. HP wanted to make itself more like IBM, which had been successfully revived by Louis Gerstner. But Gerstner refused to act precipitously after taking over IBM. He had a plan, he had a vision and he and his team slowly showed their vision worked by establishing their operational credibility. They satisfied the Board, they satisfied the market, they satisfied the pundits. That has been something H-P and Symantec have never been able to do.
  4. And the biggie vis-à-vis Equivio: in so many cases Microsoft has acted the “rogue” takeover company, attacking companies to reduce their value then buying them up without the knowledgeable team in place. In the press it became known as “Microshaft”. It would take over a company and then every single person with a brain would immediately resign leaving the source code in the hands of interns and idiots. Will Nadella do the smart thing and keep the Equivio team? Otherwise … as an analysts said about a previous Microsoft acquisition … “they end up handing the new source code over to those without a clue, those who don’t care and they crate Windoze.”

… Will the electronic data discovery (EDD) herd be winnowed down? The EDD pundits say eventually there will be only 7-10 companies doing this. Up until this week I would have not put Microsoft (via Equivio) on that list. But nobody can accurately predict because we are so, so early in this game.” For more on Greg Bufithis’ many interesting thoughts on the deal, see the Project Counsel blog.


Rob-RobinsonRob Robinson (Managing Partner, ComplexDiscovery Solutions): “The rumored acquisition highlights the ever-increasing need for organizations to do more for information governance purposes than just store documents. Equivio will offer Microsoft enterprise customers additional ways to organize, cull, and work with their text-based documents. However, the combined offering may still leave organizations challenged in terms of being able to work effectively with non-text or poor-text documents – and in some industries like Oil & Gas, those types of documents can account for a significant percentage of entire collections. So bottom line: appears to be good progress for Microsoft in dealing with text-based documents, but still a need and opportunity for technologies dealing will all document formats.”


William_webberWilliam Webber (Information Scientist, eDiscovery Consultant): “It’s not the case that Microsoft lacks the capacity for applying AI to document search; indeed, the research group at Bing has been somewhat in advance of Google in applying machine learning approaches to web search. I think that a company like MS buys a company like Equivio not for their technology in the abstract (and while I think Equivio has been successful in finding the appropriate application of technology to e-discovery, I don’t think that their technology itself is all that revolutionary); rather, a MS would buy an Equivio as a path into an industry, first for having a concrete product, and second for having practical experience (and a customer base) in that industry. Is Equivio a good choice for MS on this basis? Well, I guess time will tell. My main query here would be that Equivio seem to have an offering narrowly focused on certain technologies, most specifically predictive coding, while leaving much of the surrounding work (processing, review, producing) to other products.  MS-Equivio have to decide whether they’re going to continue in this technological niche, or to expand to a full e-discovery management system. Staying in the technological niche seems an odd choice, and a potentially perilous one, as it assumes that full-featured providers will not be able to replicate Equivio’s technology offering on their own. But will an Equivio under MS ownership have the focus, drive, and industry understanding to expand their offering to a full e-discovery suite?”


barclay-t-blairBarclay T. Blair (President and founder of ViaLumina and the Executive Director and founder of the Information Governance Initiative): “In the past two weeks, three of the Information Governance’s Initiative’s vendor supporters have been involved in M&A transactions. Fontis International was purchased by Iron Mountain, and, according to some reports, Equivio is being acquired by Microsoft. These transactions are exciting to me as they provide the latest validation that the information governance market is taking hold. In the case of Fontis, Iron Mountain was attracted to the central IG rules engine that Fontis provides. In the case of Equivio, I can only surmise that Microsoft was not only interested in Equivio’s predictive coding offering for e-discovery, but also in the long-term growth opportunity represented by its productization of predictive coding for proactive management and remediation of content, another IG use case. I expect that we will see other transactions involving leading IG companies as this market matures and develops. This topic has come up in advisory sessions innumerable times in the past couple of quarters both with my IG provider clients and with investors. Hot acquisition targets are any company that provides automation of IG, or so-called auto-classification. Everybody is looking for it and the smart money realizes that there is a huge opportunity there for companies who can bring it to market in a way that is easy, cost-effective and that scales across multiple categories (a very different use case than e-discovery). There are dozens of small, specialized companies in that space. Remember it encompasses a much broader world than e-discovery. Clearly there will be massive consolidation as the big horizontal enterprise software companies move into this space. Enterprise software companies hear the same thing over and over from their customers: we need help managing the deluge of unstructured information. Even the new wave of content management vendors like Box, which claim to “not be your father’s content management system” are adding decidedly unsexy features like workflow and retention management.

The most important question moving forward is: who has the data? As more data flows out of company owned and operated data centers and into broad horizontal cloud environments like Google, Amazon, IBM, Microsoft, HP, and even Apple and “data lake” providers like Pivotal, and into vertical-specific hosted applications (construction management, sales management, manufacturing management), I believe we will see wide-scale adoption of things like industry-standard taxonomies, automated retention schedules, file plans, information protection programs, etc. The vendors who hold this data are starting to see and realize the opportunity to add increasingly valuable services on top of the data.”

Industry Insider Off-The-Record Comments

anonymousThe following are verbatim quotes, but without attribution. All I can say is that these insiders work for large organizations and do not have permission to speak on the record. But hey, I got them to speak! I will let you imagine who said what.

Anonymous One: “EDiscovery is data dialysis. We take the data out of the corporate body and scrub it, then re-inject the data into the enterprise requirements flow (in this case, production of responsive ESI). In the long run, if the corporate filters are functioning well (like a healthy liver) externalizing ESI to filter it appropriately will not be necessary. EDiscovery will be part of enterprise IT, inside the firewall. Microsoft apparently believes this.Microsoft only acquires 2.0 and above versions. Their faith that advanced analytics of ESI by Equivio are at the 2.0 level is good news for every predictive coding solutions provider.Predictive coding is now in play for enterprise IT. Look for more acquisition and integration.”


Anonymous Two: “Anomaly or harbinger? The discovery market still seems very fragmented and immature, and curiously so. If we were to start the clock around the time the 2006 amendments came into force and compare the discovery market to other markets, say in some technology sectors, the discovery market has remained very much immature. Just when there one thinks that the market is starting to rationalize, new entrants arrive. At times it feels like a game of whack-a-mole. Whether this market immaturity is a desirable or undesirable depends on your point of view. Put another way, how do the buyers of discovery services view a more consolidated market? Is it a good thing, as it may engender standardization and more efficient vendor management, or will it result in less competition in the market. I just don’t know.”

Economic theory most certainly teaches that, in the long run, the market will consolidate and mature but, then again, “in the long run we are all dead.” (A quotation attributed to John Maynard Keynes, I believe).”


Anonymous Three: “There is no question that Microsoft was a pioneer in business software. But the kind of workflow automation software at the core of eDiscovery and information governance is not Microsoft’s area of expertise, which is likely why they’ve waited so long to enter this market.

The easiest way to close gaps in their information governance and eDiscovery portfolio is by acquiring technologies like Equivio, but that still leaves a lot of unanswered questions for their customers and the eDiscovery market as a whole. For example, all of the eDiscovery providers who license Equivio’s technology for predictive coding may find themselves in a major bind because they don’t own their own machine learning technology. It will be interesting to see how this move impacts companies like Kcura and their Relativity product.

On the customer front, Microsoft also needs to figure out how to handle data collection from laptops, desktops and systems beyond Exchange and Sharepoint. There are also a lot of lingering questions about Microsoft’s search limitations, scalability and integration between multiple on premise and cloud offerings that have to be answered.

Buying Equivio could help solve problems related to analytics, review and production, but they still have a long way to go to catch a lot of enterprise eDiscovery and information governance competitors who have a big head start.”


Secret Shh!Stay tuned for Part Two of this blog next week where I will try to synthesize all of these great comments and bring it all to a conclusion. Are you an industry leader that would like to comment, but your words have to be undercover? Perhaps you are with Microsoft or Equivio? Or one of their competitors, where it might be unseemly to speak on the record, especially if it is not all politically correct? I will keep your identity secret and I never reveal my sources. Ready to talk openly, but I just failed to ask? My bad. Send me an email. I did not have time to ask all of the important folks that I wanted to hear from. I will try to include your remarks in Part Two.


Get every new post delivered to your Inbox.

Join 3,655 other followers