There is a battle in the legal tech world between Information Governance and Search. It reflects a larger conflict in IT and all of society. Last year I came to believe that Information Governance’s preoccupation with classification, retention, and destruction of information was a futile pursuit. I challenged these activities as inefficient and doomed to failure in the age of information explosion. Instead of classify and kill, I embraced the googlesque approach of save and search.
I became wary of the whole approach of governing information as hostile to individual privacy rights and liberties. In my experience IG rules only seemed to serve the large entities who made them. For instance, IG rules typically state that employees have no reasonable expectation of privacy to any communications they may have at work, that all of their email accounts, even personal, can be searched at will. Their every keystroke can be monitored and recorded. Old school records policies seemed to encourage these draconian approaches. Under current U.S. law, these rules are usually enforceable.
Although I appeared to be a lonely searcher-voice in the legal technology world, which is, after all, not too surprising, since law itself is an attempt to govern, I had plenty of good company in the general technology world. There is not only Google, whom you would expect, but also EMC, GE, and a host of others. The debate is part of larger issues surrounding Big Data.
I took up arms against IG as I then knew it, which I understood to be an activity primarily designed to classify, control and delete records. I knew this conflict of approaches in how to treat information was important, and I felt compelled to speak out. Govern or Search is not just a legal issue. It is a cultural issue.
When I first spoke out with a contrarian voice, it created a controversy. Most in the legal establishment thought I was just plain wrong. Many wrote articles respectfully opposing my position. Many more were ready to argue, to fight even. Some did. I was even yelled at once at a CLE speakers dinner by a distinguished leader of IG who bristled at my challenges (some might say baiting). She insisted that everyone in her very large corporation could easily comply with her lengthy retention schedules. Oh brother.
The more thoughtful members of the IG leadership responded to the opposition with dialogue. This requires listening and trying to understand the points of the other side. I understand and favor dialogue, which is what attracted me to Sedona back in the day. I learned from this dialogue that IG, like Search, is not a monolith, that there are various factions and groups within IG.
After months of dialogue with the modern camp of IG, I have come to see that the contest between Search and IG need not be a fight to the death. I came to see a potential win/win outcome to this struggle. To those followers of IG who, like Jason R. Baron, have already transcended the old roles of traditional records keepers, there is no need to fight at all. My quarrel is, instead, with the old-liners, the Records Manager strata of IG who are obsessed with ESI classification and killing. To those who have let go of that traditional role, and already been reborn as multimodal, AI-enhanced Information experts, I have no quarrel. You could say that a partial settlement has been reached by a realignment of the parties.
My opposition continues only with the old-time record keepers with their long complex retention schedules and harsh top down rules. I will continue to oppose these caterpillars, no matter what smoke they may blow my way, unless and until they bow to the inevitable electronic metamorphosis. There has been no settlement with them. Trial in the world court of public opinion continues. I will oppose them for their own good. The librarians should relax, perhaps inhale a bit, cocoon, learn the new tech ways, and reemerge.
The battle against the new age Information Governors is, however, over; although I will remain watchful. Why? Because they in fact have already embraced the search and technology ways of “my side.” As Sun Tzu said: “The supreme art of war is to subdue the enemy without fighting.” Search and technology have won. Information has won. They are all one.
Underneath the superficial differences, and the annoying tendency of IG to claim every other field, including Search, as a subset of its own, both sides share almost all of the same values and concerns. Members of both sides are committed to cybersecurity and privacy, and do not see them as an either or choice. That is critical. We must not sacrifice all of our privacy and individual rights in the name of security.
Where are the rights to both privacy and security in the challenge of too-much-information? I am a strong proponent of privacy, and so are many in the IG world. I am also a strong proponent of cybersecurity. I think it is possible to have both. In both the Search and IG camps their are people who agree with me on these points, and others who disagree. Many see it as one or the other, especially people in government. They take extreme views favoring either security or privacy. Many in both tech and government simply dismiss the importance of privacy, and say just get over it. Advocacy for individual privacy is a separate battle in both worlds, IG and Search. The same is true over cybersecurity. I favor a balanced approach, and so do many in the IG world.
The real battle is not between new IG and Search, it is between the extreme positions that can be found in both camps on the issues of privacy and security. I advocate for a middle ground, privacy and security, and so do many in the IG world. I am also apprehensive of the emergence of Big Brother from Big Data, but, as it turns out, so are many in the IG world. Our common ground is far greater than our differences. Thus a realignment of the parties to our common foes.
Death of a Caterpillar
The traditionalists in the IG world whom I continue to oppose, the ones who are glorified records managers, have another five years, at best, before complete obsolescence. The classify and control lock-down approach of records management is contrary to the time. It cannot withstand the continuing exponential growth of data, nor the basic entropy forces aligned against all attempts to govern by all-too-human rules and compliance. Records managers are caterpillars waiting to be reborn. They should withdraw into a cocoon and embrace the change.
My prediction is that within five years the traditional records management activities, specifically the classification, filing and obsessive deletion of data, will no longer be worth the effort. (I concede that some deletion is necessary and will continue.) It will be far more efficient to rely on advanced Search, than classify and kill. This five-year projection assumes continued exponential growth and complexity of ESI. Breakthroughs in search in the next five years would be nice too, but my prediction does not depend on that. It assumes instead a slow, steady improvement of search technologies. They are already awesome, when used properly. The caterpillar record managers will grow big and fly high with search if they will only allow themselves to have new eyes.
Alas, as of now the old-school IG’ers still see the world through paper glasses. They think that Information Governance is like paper records management, just with more zeros after the number of records involved. The file-everything librarian mentality lives on, or tries to. Yawn. There is a reason nobody in the C-Suite ever took records managers seriously. Dressing them up with new titles is not going to change anything. They have to really change and be reborn into the digital world. They need to learn to fly with search, instead of creeping along with filing rules. They need to embrace the new high-tech world of IG 2.0.
ESI Grows and Changes Too Fast for Traditional Governance
Electronic information is a totally new kind of force, something Mankind has never seen before. Digital Information is a Genie out of the bottle. It cannot be captured. It cannot be managed. It certainly cannot be governed. It cannot even be killed. Forget about trying to put it back in the bottle. It is breeding faster than even Star Trek’s Tribbles could imagine. Like Baron and Paul discussed in their important 2007 law review, ESI is like a new Universe, and we are living just moments after the Big Bang. George L. Paul and Jason R. Baron, Information Inflation: Can the Legal System Adapt? 13 RICH. J.L. & TECH. 10 (2007).
What many outside of Google, Baron, and Paul fail to grasp is that Information has a life of its own. Id. at FN 30 (quoting Ludwig Wittgenstein (a 20th Century Austrian philosopher whom I was forced to study while in college in Vienna): “[T]o imagine a language is to imagine a form of life.”) Electronic information is a new and unique life form that defies all attempts of limitation, much less governance. As James Gleick observed in his book on information science, everything is a form of information. The Universe itself is a giant computer and we are all self-evolving algorithms. Gleick, The Information: a history, a theory, a flood.
Many claim that information wants to be free. It does not want to be governed, or charged for. Information is more useful when free and when it is not subject to transitory restraints. Still, it must also be respected and safeguarded.
Stuart Brand of Whole Earth Catalogue fame is credited with originating the phrase information wants to be free, but in fact his quote is taken out of context. His whole quote from the Whole Earth Review, May 1985, actually was:
On the one hand information wants to be expensive, because it’s so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time. So you have these two fighting against each other.
Regardless of the economic aspects, and whether information really wants to be free, as a practical matter Information itself cannot be governed, even if some of it can be commoditized. Information is moving and growing far too fast for governance. But not too fast for search or security, at least I hope not. There are promising tech methods on the horizon that should guaranty privacy. See eg.: Entangled Photons on Silicon Chip: Secure Communications & Ultrafast Computers, The Hacker News, 1/27/15 (quantum entanglement encryption as the ultimate privacy solution).
Digitized information is like a nuclear reaction that has passed the point of no return. The chain reaction has been triggered. This is what exponential growth really means. In time such fission vision will be obvious. Even people without Google glasses will be able to see it. Just look at the extent of ESI proliferated during any minute of the world today as shown by the chart below. And the volume of ESI stored doubles at least every two years.
In the meantime we have records managers running around who serve like heroic bomb squads. Some know that it is just a noble quest, doomed to failure. Most do not. Some helicopter in and out of corporate worlds like wannabe Brian Williamses. They take flack (for real). They attempt to defuse ticking information bombs. They build walls around it. They confidently set policies and promulgate rules. They inventory it, map it, delete it. They talk sternly about enforcement of rules. (Of course, that never happens, which is one reason the whole effort is futile.) They automate deletion. They also try to automate filing. Some are even starting to make robot file clerks. But is it worth the effort? Might the time and money be better spent to protect our data from black hat hackers? To protect our privacy and individual rights?
The old school IG’ers are all working diligently to try to solve today’s problems of information management. But, all the while, ever new problems encroach upon their walls. They cannot keep up with this growth, the new forms of information. The next generation of exponential growth builds faster than anyone can possibly govern. Do they not know that the nuclear bomb has already exploded? The tipping point has already past?
Information retention policies that are being created today are like sand castles built at low tide. Can you hear the next wave of data generated by the Internet of Things? It will surely wash away all of today’s efforts. There will always be more data, more unexpected new forms of information.
IG Through the Eyes of an AI-Enhanced Butterfly
I used to endorse the old ways myself. I used to be a caterpillar. ESI feared me. I was all about killing data as soon as you no longer had a business need for it. I was all in favor of short retention schedules. But, that was then. That was before I really mastered predictive coding, which in my version means active machine learning. That was before I understood much better than I used to, that we are living in a whole new world of Big Data Analytics.
I now realize that is possible to dramatically reduce the costs of document review. I now realize the incredible power of AI enhanced search. I am starting to realize the potential value of large pools of seeming worthless data. These realizations change everything. I have been reborn as a butterfly with digital wings of AI.
Old school IG, by which I mean e-dressed-up records management, is not the way to deal with today’s all digital world. We are all suffering from information overload. We are all looking for a solution. Will we cope by Search and advanced technology, or by vertical forces of governance and man-made laws? This is an important question for everyone.
My understanding and experiences with Big Data analytics over the last few years have led me understand that more data can mean more intelligence, that it does not necessarily mean more trouble and expense. I understand that more and bigger data has its own unique values, so long as it can be analyzed and searched effectively.
This change of position was reinforced by my observing many litigated cases where companies no longer had the documents they needed to prove their case. The documents had short retention spans. They had all been destroyed in the normal course of business before litigation was ever anticipated. I have seen first hand that yesterday’s trash can be tomorrow’s treasure. I will not even go into the other kind of problems that very short retention policies can place upon a company to immediately implement a lit-hold. The time pressures to get a hold in place can be enormous and thus errors become more likely.
There is a definite dark side to data destruction that many do not like to face. No one knows for sure when data has lost its value. The meaningless email of yesterday about lunch at a certain restaurant could well have a surprise value in the future. For instance, a time-line of what happened when, and to whom, is sometimes an important issue in litigation. These stupid lunch emails could help prove where a witness was and when. They could show that a witness was at lunch, out of the office, and not at a meeting as someone else alleges.
Who knows what value such seemingly worthless data may someday have? Perhaps millions of emails of ten thousand employees about lunch could be used someday to prove or disprove certain class-action allegations. Outside of the little world of litigation, perhaps the information could help management make smarter business decisions. For instance, they could help a company to decide whether to open a company cafeteria, and if so, what kind of food its employees would really like to have served there. Information can prove what really happened in the past and can help you to make the right decisions. With smart search, there can be great hidden value in too much information. Businesses are starting to see this now where Big Data mining is all the buzz. We lawyers need to start doing the same.
The point is, with the never-ending uncertainties of tomorrow, you can never know for sure that information is valueless and should be destroyed, and what information has value and should be saved. There may be an unimaginably large haystack of information, and you may think it only has a few valuable needles. But, you never really know. Today’s irrelevant straw could be tomorrow’s relevant needle. With the AI based search capacities we already have, capacities that are surely to improve, when you need to find a needle in these near infinite stacks, you will be able to. The cost of storage itself has become so low as to become a negligible factor for most large corporations. Why destroy data when you can effectively search it and mine it for value? That is the butterfly view.
Information Technology View on Records Management v. Search
The general IT world is also struggling between whether to go all-in with Search, or keep trying to solve the problem of too much information with records management. Unlike the legal world, where my vote for Search is still a new and small minority, in the IT world search is already a strong voice. Many in IT see attempts at information governance as a knee-jerk reaction from those still transitioning into the digital world. In the last year it seems to me that those favoring search over filing are gaining ground in the technology world. From what I see, the retain and search solution is surging ahead of the old-fashioned govern and destroy approach.
Consider, for instance, the policy of search stated by hot new companies like Pivotal, which is a joint venture between EMC, VMware, and GE. Pivotal’s public mantra is: Store Everything. Analyze Anything. Build the Right Thing.
Pivotal urges its customers to store everything, not just its organized databases, such as financial records. It provides the ability to store all types of data, including especially disorganized data, such as employee emails and texts, and do so in the same place. That is the new gold standard. Pivotal explains the value of store everything this way:
Store everything to create a rich data repository for business needs. With unlimited, supported Pivotal HD enterprises never have to worry about data growth constraints or runaway license costs.
Its suite of Big Data software is designed to allow a company to store all data types in the same place, which it, along with EMC, and others, have started calling a Data Lake. All types and formats of ESI become readable, searchable, in the Data Lake. They do not have to be stored separately, nor searched and analyzed separately. The Data Lakes are also infinitely expandable. Unlike real lakes, they cannot flood. They can instead grow unhindered in cyberspace. All they need are more servers.
These are major breakthroughs and mean the inevitable end of separate data silos by format type and size. This allows you to, in Pivotal’s words, leverage all your data, forever, and place it all in a centralized Business Data Lake. You can analyze multiple data sets and types that live in the Business Data Lake. This allows you to determine the integration value of multiple data sets and types. It also makes storage of Big Data much less expensive.
Bottom line, when all of your data is saved forever, and subject to advanced search analytics, you are empowered to build the right thing. In Pivotal’s words, building the right thing means to deliver a transformative solution to meet today’s demanding business needs. For business that means creation of new products, new advertising, new sales and business methods. For law it means building your case, finding evidence, and creating new legal methods. The promise of Big Data is changing everything in the tech world. Some in IG are also aware of these facts and are adapting ESI management accordingly.
AI-Enhanced Big Data Search Will Greatly Simplify Information Governance
The key problem all large organizations face is the challenge to find the information they need, when they need it, and do so in a cheap and efficient manner. Information needs are determined by both law and personal preferences, including business operation needs. In order to find information, you must first have it. Not only that, you must keep it until you need it. To do that, you need to preserve the information. If you have already destroyed information, really destroyed it I mean, not just deleted it, then obviously you will not be able to find it. You cannot find what does not exist, as all Unicorn chasers eventually find out.
This creates a basic problem for old-school IG because the whole system is based on a notion that the best way to find valuable information is to destroy worthless information. Much of old IG is devoted to trying to determine what information is a valuable needle, and what is worthless chaff. This is because everyone knows that the more information you have, the harder it is for you to find the information you need. The idea is that too much information will cut you off. These maxims were true in the pre-AI-Enhanced Search days, but are, IMO, no longer true today.
In order to meet the basic goal of finding information, old-school IG focuses its efforts on the proper classification of information. Again, the idea was to make it simpler to find information by preserving some of it, the information you might need to access, and destroying the rest. That is where records classification comes in.
The question of what information you need has a time element to it. The time requirements are again based on personal and business operations needs, and on thousands of federal, state and local laws. Information governance thus became a very complicated legal analysis problem. There are literally thousands of laws requiring certain types of information to be preserved for various lengths of time. Of course, you could comply with most of these laws by simply saving everything forever, but, in the past, that was not a realistic solution. There were severe limits on the ability to save information, and the ability to find it. Also, it was presumed that the older information was, the less value it had. Almost all information was thus treated like news.
These ideas were all firmly entrenched before the advent of Big Data and AI-enhanced data mining. In fact, in today’s world there is good reason for Google to save every search, ever done, forever. Some patterns and knowledge only emerge in time and history. New information is sometimes better information, but not necessarily so. In the world of Big Data all information has value, not just the latest.
The records life-cycle ideas all made perfect sense in the world of paper information. It cost a lot of money to save and store paper records. Everyone with a monthly Iron Mountain paper records storage bill knows that. Even after the computer age began, it still cost a fair amount of money to save and store ESI. The computers needed to buy and maintain digital storage used to be very expensive. Finding the ESI you needed quickly on a computer was still very difficult and unreliable. All we had at first was keyword search, and that was very ineffective.
Due to the costs of storage, and the limitations of search, tremendous efforts were made by record managers to try to figure out what information was important, or needed, either from a legal perspective, or a business necessity perspective, and to save that information, and only that information. The old idea behind IG was to destroy the ESI you did not need or were not required by law to preserve. This destruction saved you money, and, it also made possible the whole point of IG, to find the information you wanted, when you wanted it.
Back in the pre-AI search days, the more information you had, the harder it was to find the information you needed. That still seems like common sense. Useless information was destroyed so that you could find valuable information. In reality, with the new and better algorithms we now have for AI-enhanced search, it is just the reverse. The more information you have, the easier it becomes to find what you want. You now have more information to draw upon.
That is the new reality of Big Data. It is a hard intellectual paradigm to jump, and seems counter-intuitive. It took me a long time to get it. The new ability to save and search everything cheaply and efficiently is what is driving the explosion of Big Data services and products. As the save everything, find anything way of thinking takes over, the classification and deletion aspects of IG will naturally dissipate. The records life-cycle will transform into virtual immortality. There is no reason to classify and delete, if you can save everything and find anything at low cost. The issues simplify; they change to how to save and search, although new collateral issues of security and privacy grow in importance.
Recent Breakthroughs in Artificial Intelligence
Make Possible Save Everything, Find Anything
The New York Times in an opinion editorial in late 2014 discussed recent breakthroughs in Artificial Intelligence and speculated on alternative futures this could create. Our Machine Masters, NT Times Op-Ed, by David Brooks (October 31, 2014). The Times article quoted extensively another article in the Wired by technology blogger Kevin Kelly: The Three Breakthroughs That Have Finally Unleashed AI on the World. Kelly argues, as do I, that artificial intelligence has now reached a breakthrough level. This artificial intelligence breakthrough, Kevin Kelly argues, and David Brook’s agrees, is driven by three things: cheap parallel computation technologies, big data collection, and better algorithms. The upshot is clear in the opinion of both Wired and the New York Times: “The business plans of the next 10,000 start-ups are easy to forecast: Take X and add A.I. This is a big deal, and now it’s here.”
These three new technology advances change everything. The Wired article goes into the technology and financial aspects of the new AI; it is where the big money is going and will be made in the next few decades. If Wired is right, then this means in our world of e-discovery, companies and law firms will succeed if, and only if, they add AI to their products and services. The firms and vendors who add AI to document review, and project management, will grow fast. The non-AI enhanced vendors, non-AI enhanced software, will go out of business. The law firms that do not use AI tools will shrink and die. The same goes for IG.
The three big new advances that are allowing better and better AI are nowhere near to threatening the jobs of human judges or lawyers, although they will likely reduce their numbers, and certainly will change their jobs. We are already seeing these changes in Legal Search and Information Governance. Thanks to cheap parallel computation, we now have Big Data Lakes stored in thousands of inexpensive, cloud computers that are operating together. This is where open-sourced software like Hadoop comes in. They make the big clusters of computers possible. The better algorithms is where better AI-enhanced Software comes in. This makes it possible to use predictive coding effectively and inexpensively to find the information needed to resolve law suits. The days of vast numbers of document reviewer attorneys doing linear review are numbered. Instead, we will see a few SMEs, working with small teams of reviewers, search experts, and software experts.
The role of Information Managers will also change drastically. Because of Big Data, cheap parallel computing, and better algorithms, it is now possible to save everything, forever, at a small cost, and to quickly search and find what you need. The new reality of Save Everything, Find Anything undercuts most of the rationale of old paradigm of Information Governance, but not the new. The new paradigm of IG gets it, and relies on AI technology.
The save everything forever AI search model of new IG will create a variety of new legal work for lawyers, but they will be the next generation of tech lawyers. The cybersecurity protection and privacy aspects of Big Data Lakes are already creating many new legal challenges and issues. Big Data breaches already mean Big Money for the law firms who offer curative services. That is happening now. In the future lawyers will play a larger role in preventative security issues. More legal issues are sure to arise with the expansion of Big Data, AI, and development of the next generation of IG. From what I have seen technology creates new jobs as fast as it eliminates old ones. The real challenge is keeping up with the changes.