This is the second part of a two-part blog, please read part one first.
AI-Enhanced Big Data Search Will Greatly Simplify Information Governance
Information Governance is, or should be, all about finding the information you need, when you need it, and doing so in a cheap and efficient manner. Information needs are determined by both law and personal preferences, including business operation needs. In order to find information, you must first have it. Not only that, you must keep it until you need it. To do that, you need to preserve the information. If you have already destroyed information, really destroyed it I mean, not just deleted it, then obviously you will not be able to find it. You cannot find what does not exist, as all Unicorn chasers eventually find out.
This creates a basic problem for Information Governance because the whole system is based on a notion that the best way to find valuable information is to destroy worthless information. Much of Information Governance is devoted to trying to determine what information is a valuable needle, and what is worthless chaff. This is because everyone knows that the more information you have, the harder it is for you to find the information you need. The idea is that too much information will cut you off. These maxims were true in the pre-AI-Enhanced Search days, but are, IMO, no longer true today, or, at least, will not be true in the next five to ten years, maybe sooner.
In order to meet the basic goal of finding information, Information Governance focuses its efforts on the proper classification of information. Again, the idea was to make it simpler to find information by preserving some of it, the information you might need to access, and destroying the rest. That is where records classification comes in.
The question of what information you need has a time element to it. The time requirements are again based on personal and business operations needs, and on thousand of federal, state and local laws. Information governance thus became a very complicated legal analysis problem. There are literally thousands of laws requiring certain types of information to be preserved for various lengths of time. Of course, you could comply with most of these laws by simply saving everything forever, but, in the past, that was not a realistic solution. There were severe limits on the ability to save information, and the ability to find it. Also, it was presumed that the older information was, the less value it had. Almost all information was thus treated like news.
These ideas were all firmly entrenched before the advent of Big Data and AI-enhanced data mining. In fact, in today’s world there is good reason for Google to save every search, ever done, forever. Some patterns and knowledge only emerge in time and history. New information is sometimes better information, but not necessarily so. In the world of Big Data all information has value, not just the latest.
This records life-cycle ideas all made perfect sense in the world of paper information. It cost a lot of money to save and store paper records. Everyone with a monthly Iron Mountain paper records storage bill knows that. Even after the computer age began, it still cost a fair amount of money to save and store ESI. The computers needed to buy and maintain digital storage used to be very expensive. Finding the ESI you needed quickly on a computer was still very difficult and unreliable. All we had at first was keyword search, and that was very ineffective.
Due to the costs of storage, and the limitations of search, tremendous efforts were made by record managers to try to figure out what information was important, or needed, either from a legal perspective, or a business necessity perspective, and to save that information, and only that information. The idea behind Information Management was to destroy the ESI you did not need or were not required by law to preserve. This destruction saved you money, and, it also made possible the whole point of Information Governance, to find the information you wanted, when you wanted it.
Back in the pre-AI search days, the more information you had, the harder it was to find the information you needed. That still seems like common sense. Useless information was destroyed so that you could find valuable information. In reality, with the new and better algorithms we now have for AI-enhanced search, it is just the reverse. The more information you have, the easier it becomes to find what you want. You now have more information to draw upon.
That is the new reality of Big Data. It is a hard intellectual paradigm to jump, and seems counter-intuitive. It took me a long time to get it. The new ability to save and search everything cheaply and efficiently is what is driving the explosion of Big Data services and products. As the save everything, find anything way of thinking takes over, the classification and deletion aspects of Information Governance will naturally dissipate. The records lifecycle will transform into virtual immortality. There is no reason to classify and delete, if you can save everything and find anything at low cost. The issues simplify; they change to how to save and search, although new collateral issues of security and privacy grow in importance.
Save and Search v. Classify and Delete
The current clash in basic ideas concerning Big Data and Information Governance is confusing to many business executives. According to Gregory Bufithis who attended a recent event in Washington D.C. on Big Data sponsored by EMC, one senior presenter explained:
The C Suite is bedeviled by IG and regulatory complexity. …
The solution is not to eliminate Information Governance entirely. The reports of its complete demise, here or elsewhere, are exaggerated. The solution is to simplify IG. To pare it down to save and search. Even this will take some time, like I said, from five to ten years, although there is some chance this transformation of IG will go even faster than that. This move away from complex regulatory classification schemes, to simpler save and search everything, is already being adopted by many in the high-tech world. To quote Greg again from the private EMC event in D.C. in October, 2014:
Why data lakes? Because regulatory complexity and the changes can kill you. And are unpredictable in relationship to information governance. …
So what’s better? Data lakes coupled with archiving. Yes, archiving seems emblematic of “old” IT. But archiving and data lifecycle management (DLM) have evolved from a storage focus, to a focus on business value and data loss prevention. DLM recognizes that as data gets older, its value diminishes, but it never becomes worthless. And nobody is throwing out anything and yes, there are negative impacts (unnecessary storage costs, litigation, regulatory sanctions) if not retained or deleted when it should be.
But … companies want to mine their data for operational and competitive advantage. So data lakes and archiving their data allows for ingesting and retain all information types, structured or unstructured. And that’s better.
Because then all you need is a good search platform or search system … like Hadoop which allows you to sift through the data and extract the chunks that answer the questions at hand. In essence, this is a step up from OLAP (online analytical processing). And you can use “tag sift sort” programs like Data Rush. Or ThingWorx which is an approach that monitors the stream of data arriving in the lake for specific events. Complex event processing (CEP) engines can also sift through data as it enters storage, or later when it’s needed for analysis.
Because it is all about search.
Recent Breakthroughs in Artificial Intelligence
Make Possible Save Everything, Find Anything
The New York Times in an opinion editorial this week discussed recent breakthroughs in Artificial Intelligence and speculated on alternative futures this could create. Our Machine Masters, NT Times Op-Ed, by David Brooks (October 31, 2014). The Times article quoted extensively another article in the current issue of Wired by technology blogger Kevin Kelly: The Three Breakthroughs That Have Finally Unleashed AI on the World. Kelly argues, as do I, that artificial intelligence has now reached a breakthrough level. This artificial intelligence breakthrough, Kevin Kelly argues, and David Brook’s agrees, is driven by three things: cheap parallel computation technologies, big data collection, and better algorithms. The upshot is clear in the opinion of both Wired and the New York Times: “The business plans of the next 10,000 start-ups are easy to forecast: Take X and add A.I. This is a big deal, and now it’s here.”
These three new technology advances change everything. The Wired article goes into the technology and financial aspects of the new AI; it is where the big money is going and will be made in the next few decades. If Wired is right, then this means in our world of e-discovery, companies and law firms will succeed if, and only if, they add AI to their products and services. The firms and vendors who add AI to document review, and project management, will grow fast. The non-AI enhanced vendors, non-AI enhanced software, will go out of business. The law firms that do not use AI tools will shrink and die.
The Times article by David Brooks goes into the sociological and philosophical aspects of the recent breakthroughs in Artificial Intelligence:
Two big implications flow from this. The first is sociological. If knowledge is power, we’re about to see an even greater concentration of power. … [E]ngineers at a few gigantic companies will have vast-though-hidden power to shape how data are collected and framed, to harvest huge amounts of information, to build the frameworks through which the rest of us make decisions and to steer our choices. If you think this power will be used for entirely benign ends, then you have not read enough history.
The second implication is philosophical. A.I. will redefine what it means to be human. Our identity as humans is shaped by what machines and other animals can’t do. For the last few centuries, reason was seen as the ultimate human faculty. But now machines are better at many of the tasks we associate with thinking — like playing chess, winning at Jeopardy, and doing math. [RCL – and, you might add, better at finding relevant evidence.]
On the other hand, machines cannot beat us at the things we do without conscious thinking: developing tastes and affections, mimicking each other and building emotional attachments, experiencing imaginative breakthroughs, forming moral sentiments. [RCL – and, you might add, better at equitable notions of justice and at legal imagination.]
Although I share the concerns of the NY Times about mastering machines and alternative future scenarios, my analysis of the impact of the new AI is focused and limited to the Law. Lawyers must master the AI-search for evidence processes. We must master and use the better algorithms, the better AI-enhanced software, not visa versa. The software does not, nor should it, run itself. Easy buttons in legal search are a trap for the unwary, a first step down a slippery slope to legal dystopia. Human lawyers must never over-delegate our uniquely human insights and abilities. We must train the machines. We must stay in charge and assert our human insights on law, relevance, equity, fairness and justice, and our human abilities to imagine and create new realities of justice for all. I want lawyers and judges to use AI-enhanced machines, but I never want to be judged by a machine alone, nor have a computer alone as a lawyer.
The three big new advances that are allowing better and better AI are nowhere near to threatening the jobs of human judges or lawyers, although they will likely reduce their numbers, and certainly will change their jobs. We are already seeing these changes in Legal Search and Information Governance. Thanks to cheap parallel computation, we now have Big Data Lakes stored in thousands of inexpensive, cloud computers that are operating together. This is where open-sourced software like Hadoop comes in. They make the big clusters of computers possible. The better algorithms is where better AI-enhanced Software comes in. This makes it possible to use predictive coding effectively and inexpensively to find the information needed to resolve law suits. The days of vast numbers of document reviewer attorneys doing linear review are numbered. Instead, we will see a few SMEs, working with small teams of reviewers, search experts, and software experts.
The role of Information Managers will also change drastically. Because of Big Data, cheap parallel computing, and better algorithms, it is now possible to save everything, forever, at a small cost, and to quickly search and find what you need. The new reality of Save Everything, Find Anything undercuts most of the rationale of Information Governance. It is all about search now.