There is a micro-battle brewing between Information Governance and Search in the legal world that reflects a larger conflict in the greater information technology world. I touched upon this in my blog last week, e-Discovery Industry Reaction to Microsoft’s Offer to Purchase Equivio for $200 Million – Part Two. I speculated that Microsoft was probably buying Equivio because it wants to improve its information governance products. That is what most in the e-discovery industry seems to think, and, after all, law is where Equivio now operates. I also speculated that it might instead be a pure search play on Microsoft’s part, that Microsoft might already understand, like Google clearly does, that search is now king, and Information Governance is a fading fad.
In my blog last week I broke ranks with almost all other specialists in e-discovery and turned against Information Governance. Instead, I took sides with Search, and opined that the preoccupation with classification, retention, and destruction of data would soon stop being a viable, efficient activity. Instead, I suggested that we all focus on the googlesque approach, one that I had previously disdained, of save everything and search instead of classify.
I still hold Information Governance in high respect, and think it has another five years or so of useful contributions. But ultimately I think the classify and control lock-down approach of IG is futile. It cannot withstand the continuing exponential growth of data, nor the basic entropy forces aligned against all attempts to govern, especially attempts based on all-too-human rules and compliance. My prediction is that within five to ten years, IG will no longer be worth the effort. This projection assumes continuing exponential growth and future improvements in search. Breakthroughs in search would be nice, but my projection does not depend on that. It assumes instead a slow, steady improvement of search technologies. For now IG helps search, and so is worth it, but not for much longer.
For the immediate future a dual approach, govern and search, is still viable. It may ultimately be a Don Quixote quest, but not just yet. Still, soon enough, and perhaps in even less than five years, it may no longer be worth the time and expense to try. At least this is how I see it, and, as I must in all honesty admit, this is still very much a minority report in the world of legal technology.
Since my blog last week I have learned much more about this conflict in the larger world of technology. Although I am a lonely voice in the legal technology world, which after all is not unexpected, since law itself is an attempt to govern, I have plenty of good company in the general technology world. There is not only Google, whom I expected, but also EMC, GE, and a host of others. The debate is part of larger issues surrounding Big Data. The outcome will impact everyone’s life for years to come. Govern or Search is not just a legal issue. It is a cultural issue.
Search or Governance:
What is the best approach to survive the information deluge?
In last week’s blog I played the role of the child pointing out that the Emperor has no clothes. I suggested that Information Governance is really little more than e-dressed-up records management. This is not a popular view, but the Information Governance world so far has not reacted too vehemently. No death threats like Gamergate. Even a few positive comments.
Perhaps most of the IG world hope that I will just go away and stop this conversation. Sorry. Not going to happen. In fact, I feel compelled to summarize and expand upon what I said last week. These are critical issues. They not only challenge the legal world, but everyone who uses a computer. We are all suffering from information overload. We are all looking for a solution. Will we cope by search, or by vertical forces of governance, man-made laws?
As a person who has devoted his life to law and rules, and been a lawyer focusing on technology for over 34 years, I know firsthand the limitations of law and governance. I am sure that rules are not the way to go. It is better to rely on search and technology. You may say there is no reason to choose, but really there is. Their methods ultimately diverge, and resources, like attention, are limited. The focus must be on search.
I hope this will become a full-scale dialogue, as the late great Richard Braman would have us do. But if not dialogue, then at least a public debate. Perhaps I will square off with an IG leader at Legal Tech? Time will tell. We are waiting for some vendors to step up to the plate and sponsor such discussions. In the meantime, if you did not read last week’s blog, I urge you to do so, Part Two at least. Here are a few excerpts that I would like to emphasize:
I think the establishment majority in our industry is deluding themselves into thinking that information is like paper, only there is more of it. They delude themselves into thinking that Information is capable of being governed, just like so many little paper soldiers in an army. I say the Emperor has no clothes. That information cannot be governed.
Electronic Information is a totally new kind of force, something Mankind has never seen before. Digital Information is a Genie out of the bottle. It cannot be captured. It cannot be managed. It certainly cannot be governed. It cannot even be killed. Forget about trying to put it back in the bottle. …
Essentially information is free, and wants to be free. It does not want to be governed, or charged for. Information is more useful when free and when it is not subject to transitory restraints. …
Regardless of the economic aspects, and whether information really wants to be free or not, as a practical matter Information cannot be governed, even if some of it can be commoditized. Information is moving and growing far too fast for governance.
I stated last week that information inflation is like a nuclear bomb, and we have reached the tipping point of no return in the atomic fission chain reaction. I invoked the fission vision to try to help convey what exponential information growth really means. For those reasons I called Information Governance a noble, but futile, Don Quixote quest. I asserted that it is impossible to file and classify an e-data world that doubles in size and complexity every couple of years. Yet, the Information Governance experts try to do just that. I know, I used to be one of them. Here is how I put it last week.
[W]e have a new breed of information governance experts running around who serve like heroic bomb squads. Some know that it is just a noble quest, doomed to failure. Most do not. They helicopter into corporate worlds attempting to defuse ticking information bombs. They build walls around it. They confidently set policies and promulgate rules. They talk sternly about enforcement of rules. They automate filing. They automate deletion. Some are even starting to make robot file clerks
Information governance experts, just like the records managers before them, are all working diligently to try to solve today’s problems of information management. But, all the while, ever new problems encroach upon their walls. They cannot keep up with this growth, the new forms of information. The next generation of exponential growth builds faster than anyone can possibly govern. Do they not know that the bomb has already exploded? The tipping point has already past?
Information governance policies that are being created today are like sand castles built at low tide. Can you hear the next wave of data generated by the Internet of Things? It will surely wash away all of today’s efforts. There will always be more data, more unexpected new forms of information. Governance of information is a dream, a Don Quixote quest.
I used yet another analogy last week of the little Dutch boy with his finger in the Dyke trying to stop the flood of data from overwhelming us all. I explained that this was impossible in my view because of the exponentially growing volume and complexity of data. I concluded that search was the only answer in tomorrow’s world, that governance was no longer possible. Instead of making and trying to enforce information rules based on archaic notions of governance, we should instead harness all of our efforts to improve search.
Do not waste your valuable time and effort trying to file information. Just search for it, when and if you need it. You will not need most of it anyway. …
It is hubris to think that a force as mysterious and exponential as Information can be governed. …
Search is and will remain the dominant problem of our age for generations. Information cannot be governed. It cannot be catalogued. It can only be searched. …
Google has it right. We should focus our AI development on search, not governance. Spend your time learning to search, forget about filing. It is a hopeless waste of time. It is just like the little Dutch boy putting his finger in the dyke. Learn to swim instead. Better yet, build a search boat like Noah and leave the governor behind.
In this two-part blog I will explore further aspects of this debate, with the emphasis in Part Two on the views in the larger technology world, not just the legal e-discovery world. But, as that is my core competence, I will once again start in our litigation sandbox and work my way out into the big beach of all technology. It turns out that this beach is on a very large Data Lake surrounded by open-source software with the funny name of Hadoop. More on that later.
Information Governance from an e-Discovery Lawyer’s Perspective
Everyone in the e-discovery world is already pretty familiar with the ideas and goals of Information Management. It is the big hit in e-disco conference circles, having replaced Predictive Coding sometime last year as the latest thing. Every CLE event now has one or more panels devoted to the topic. There are even several new organizations set up to promote Information Governance,
A few large cutting edge law firms even have Information Governance practice groups. They do such things as assisting clients to establish an information governance framework to link information creation and use to business objectives. They also help clients to categorize information during its useful life, and then delete it when no longer needed. These services require an review of the company information in five steps:
- Information Inventory.
- Formation of an Information Governance Steering Committee.
- Information Policy and Protocol Review.
- Policy and Protocol Creation and Modification.
- Policy Implementation and Enforcement.
At the end of a typical IG Review a corporate client should have streamlined the creation, use and disposition of its information. This should, in turn, increase its business efficiency, reduce the risks involved in the over-or under-retention of information, and increase its ability to react to litigation or regulatory events.
Most lawyers specializing in e-discovery know the drill and have engaged in a few such IG projects. I have worked on several IG reviews, following similar steps. They typically involve many meetings, team assembly and team activities, and information work flow studies and discussions. Most of what a good IG lawyer does is educational. For instance, they teach everyone the basics of a records life cycle. They help the client to make their own decisions as to what is right for their company. They provide suggestions to consider, and set out the pros and cons to help the client’s team make good decisions. It can all become very complicated, especially with large IT infrastructures. The fees involved can be considerable. Six and seven figure projects are not unheard of.
There is no one-size-fits-all Information Governance plan. It is not a form driven practice. It all depends on so many things. For this reason a good IG lawyer acts as both a trainer and coach for the corporate team. The team makes the final decisions, not the outside legal counsel. Nevertheless, IG lawyers can have a big impact on those decisions. Most IG lawyers are inclined towards reducing data retention, as they know very well, or think they do, that the costs of search and review of data in the context of litigation is very high. They tend to steer the team in the direction of data destruction. It is heretic to say just save everything forever, and focus on search instead. Alas, that is why my ex-communication from the IG world is certain.
I used to be the way they are. ESI feared me. I was all about killing data as soon as you no longer had a business need for it. I was all in favor of short retention schedules. But, that was then. That was before I really mastered predictive coding, which in my version means active machine learning. That was before I understood much better than I used to, that we are living in a whole new world of Big Data Analytics. I now realize that is possible to dramatically reduce the costs of document review. I now realize the incredible power of AI enhanced search. I am starting to realize the potential value of large pools of seeming worthless data. These realizations change everything.
My understanding and experience with Big Data Analytics has led me to a different view of data retention and governance. I now understand that more data can mean more intelligence, that it does not necessarily mean more trouble and expense. I understand that more and bigger data has its own unique values, so long as it can be analyzed and searched effectively.
This change of position was reinforced by my observing many litigated cases where companies no longer had the documents they needed to prove their case. The documents had short retention spans. They had all been destroyed in the normal course of business before litigation was ever anticipated. I have seen first hand that yesterdays trash can be tomorrow’s treasure. I will not even go into the other kind of problems that very short retention policies can place upon a company to immediately implement a lit-hold. The time pressures to get a hold in place can be enormous and thus errors become more likely.
There is a definite dark side to data destruction that IG types do not like to face. No one knows for sure when data has lost its value. The meaningless email of yesterday about lunch at a certain restaurant could well have a surprise value in the future. For instance, a time-line of what happened when, and to whom, is sometimes an important issue in litigation. These stupid lunch emails could help prove where a witness was and when. They could show that a witness was at lunch, out of the office, and not at a meeting as someone else alleges.
Who knows what value such seemingly worthless data may someday have? Perhaps millions of emails of ten thousand employees about lunch could be used someday to prove or disprove certain class-action allegations. Outside of the little world of litigation, perhaps the information could help management make smarter business decisions. For instance, they could help a company to decide whether to open a company cafeteria, and if so, what kind of food its employees would really like to have served there. Information can prove what really happened in the past and can help you to make the right decisions. With smart search, there can be great hidden value in too much information. Businesses are starting to see this now where Big Data mining is all the buzz. We lawyers need to start doing the same.
The point is, with the never-ending uncertainties of tomorrow, you can never know for sure that information is valueless and should be destroyed, and what information has value and should be saved. There may be an unimaginably large haystack of information, and you may think it only has a few valuable needles. But, you never really know. Today’s irrelevant straw could be tomorrow’s relevant needle. With the AI based search capacities we already have, capacities that are surely to improve, when you need to find a needle in these near infinite stacks, you will be able to. The cost of storage itself has become so low as to become a negligible factor for most large corporations. Why destroy data when you can effectively search it and mine it for value?
Information Technology View on IG v. Search
The general IT world is also struggling between whether to go all-in with Search, or keep trying to solve the problem of too much information with Governance. Unlike the legal world, where my vote for search is still a new and small majority, in the IT world search is already a strong voice. Many in IT see attempts at Information Governance as misguided throwbacks to the pre-digital world. In the last year it seems to me that Search is gaining ground in the technology world. From what I see the retain and Search solution is surging ahead of the old-fashioned govern and destroy approach.
Consider, for instance, the policy of search stated by hot new companies like Pivotal, which is a joint venture between EMC, VMware, and GE. Pivotal’s public mantra is: Store Everything. Analyze Anything. Build the Right Thing.
Pivotal urges its customers to Store Everything, not just its organized databases, such as financial records. It provides the ability to store all types of data, including especially disorganized data, such as employee emails and texts, and do so in the same place. That is the new gold standard. Pivotal explains the value of store everything this way:
Store everything to create a rich data repository for business needs. With unlimited, supported Pivotal HD enterprises never have to worry about data growth constraints or runaway license costs.
Its suite of Big Data software is designed to allow a company to store all data types in the same place, which it, along with EMC, and others, have started calling a Data Lake. All types and formats of ESI become readable, searchable, in the Data Lake. They do not have to be stored separately, nor searched and analyzed separately. The Data Lakes are also infinitely expandable. Unlike real lakes, they cannot flood. They can instead grow unhindered in cyberspace. All they need are more servers.
These are major breakthroughs and mean the inevitable end of separate data silos by format type and size. This allows you to, in Pivotal’s words, leverage all your data, forever, and place it all in a centralized Business Data Lake. You can analyze multiple data sets and types that live in the Business Data Lake. This allows you to determine the integration value of multiple data sets and types. It also makes storage of Big Data much less expensive.
Bottom line, when all of your data is saved forever, and subject to advanced search analytics, you are empowered to build the right thing. In Pivotal’s words, building the right thing means to deliver a transformative solution to meet today’s demanding business needs. For business that means creation of new products, new advertising, new sales and business methods. For law it means building your case, finding evidence, and creating new legal methods. The promise of Big Data is changing everything.
Data Lakes Are Made Possible by Hadoop
Hadoop is an open sourced software program that underlies most Big Data repositories, including what Pivotal, EMC, and others call Data Lakes. Facebook, for example, uses Hadoop, and claims to have assembled the largest Hadoop cluster of computers in the world, housing 21 Petabytes of storage as of 2010. By March 2011, its Hadoop clusters had grown to 30 PB, which Facebook says is 3,000 times the size of the Library of Congress. Facebook’s Hadoop Data Lake supposedly grows at the rate of one half a petabyte a day. Although Yahoo! uses open sourced Hadoop, Google does not. Google instead uses its own proprietary version of MapReduce and Google File System, and keeps the details as closely guarded trade secrets.
Hadoop is one of the new technologies driving the search versus governance debate. Before Hadoop, store-everything, scalable, central data repositories were not possible. The relatively small separate silos of data before then made it very difficult to store large amounts of data, and to search everything. First, you had to find it, then you had to employ a number of different search methods. This made it hard to mine the value of data. The separate locations also made it much more expensive to store and maintain Big Data.
Data silos were a big impediment to search as a solution to the data deluge. Conversely, they were somewhat of a help to governance. But now the days of silos are numbered. Thanks to open-source Hadoop, and the hundreds of established companies and start-ups that provide for-profit applications for Hadoop Data Lakes, the days of data silos are numbered. Now with Hadoop it is possible to put in all data in one place, forever, and, with some limitations, search it all, whenever you want.
To be continued and concluded next week in Part Two.