The cover article in this week’s The Economist magazine is The data deluge, featuring a 14-page special report on information management entitled Data, data, everywhere. This article, and our collective situation of data overload, reminds me of one of my favorite Star Trek episodes, The Trouble With Tribbles. The cute little ESI files that everyone owns are now growing out of control and as a result our institutions are imperiled. There is great good that can come out of all this data, as the article points out, but a whole lot of tribble too. (Sorry.)
The Economist article talks about the insights that can be teased out of the data deluge, while at the same time pointing out that no one really knows how to manage it. Information management is a key part of the e-discovery world. In fact, it is the first step in the nine-fold Electronic Discovery Reference Model. We all know that ESI should be better managed, but it seems nigh impossible to do right. For some, it is a mess because they don’t even try. But for others, especially large organizations, it is beyond their capacity no matter how hard they try, primarily because of the information explosion. How can you manage something that multiplies faster that you can count ? How do you find something that moves and morphs into something new when you are not looking? One day its a word doc, the next a wiki, the next a twitter.
Data Are Like Tribbles
For most people data are like tribbles – cheap, easy to come by, cute and endearing, at least at first, but then after a while, they can become a real nightmare. You may think that you have your ESI managed, and that you can find certain tribbles if you need too, but you are probably just kidding yourself. Chances are they are growing faster than you can imagine and anyway they all tend to look alike, even when they aren’t. (Even Jason Baron and I can’t imagine how fast the data universe is growing and we’ve both spent way too much time thinking about it. See: e-Discovery: Did You Know?)
Yes, information is our generation’s tribble. As Dr. McCoy said “they are born pregnant.” One little email can become dozens before you know it. Also, like the tribbles in Star Trek who somehow squeezed into the overhead bins, ESI now-a-days is migrating into unlikely places, such as employee phones, music players, home media centers, even the clouds. If the law demands that you preserve and produce these pesky ESI tribbles in a court proceeding, the chances are high that you won’t catch them all. It’s also likely to cost too much to even try. That’s the trouble with tribbles, they seem so innocuous, just like data, but before you know it, they multiply out of control and destroy your ship. No wonder the Klingons hate them.
New Kind of Professional
The Economist article notes that a new kind of professional is emerging, one very important to business, and I suggest also to law, the data scientist. This insight supports my position that metrics, including multimodal search, is one of the three pillars of e-discovery best practices.
Chief information officers (CIOs) have become somewhat more prominent in the executive suite, and a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.
Substitute relevant evidence for wisdom and you have the challenge of e-discovery in a nutshell. If you can actually find the right tribble, the needle-of-relevance in the haystack-of-noise, then, according to the author of The Economist article, Kenneth Cukier, you would be a corporate sex-icon. Personally, I have my doubts. For those of us in the world of e-discovery, that is just another day in the office. Still, this is the sweet-spot in discovery today – the ultimate challenge: “to extract the nuggets of gold hidden under mountains of data.” Those who can do this, you can find the most smoking guns for the least amount of money, are the critical players for the 98% of law suits that never go to trail. The Perry Mason jury trial lawyer types are good for the other 2%, but even they are dependent on the smoking gun documents for their gotcha cross-exams.
This point is not lost on scientists who generate huge amounts of information on a daily basis. For instance, according to The Economist the latest Large Synoptic Survey Telescope set to open in Chile in 2016 is designed to generate 28 terabytes of data a day. Facts like this cause Alex Szalay, an astrophysicist at Johns Hopkins University, to observe what the law has already noted and embodied in Rule 26(b)(2)(B), namely that the proliferation of data is making them increasingly inaccessible. Szalay goes on conclude, as do I, that education is key and must change to better train the next generation:
How to make sense of all these data? People should be worried about how we train the next generation, not just of scientists, but people in government and industry.
The Economist also quotes T. S. Eliot, who in 1934 wrote in his poem The Rock:
Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?
Where indeed? That is the golden question today in e-discovery. Where is the highly relevant evidence and how do we find it? In my view the answer lies in a multimodal Where’s Waldo? approach, one that is bottom line driven based on cost and Rule 26(b)(2)(C). But that’s another story.
The Positive Side of Big Data
The article focuses on both the good and bad sides of the deluge. The positive side is often overlooked:
… the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account.
The Economist web also includes an audio interview with Kenneth Cukier. He thinks that Big Data as he calls it is just starting and is not really here yet. When it comes, he predicts it will change everything. To which I add “again.” The introduction of the interview (after a short commercial) summarizes the Economist article:
We will have more data than ever before, we will be able to tease out new insights with it that we could not ever do before, but it is going to create huge new headaches because we simply don’t know how to handle it.
The positive aspects of the data deluge, the ability to tease out new insights, is often forgotten in the legal arena. We tend to focus on the headaches. The data deluge allows the law to have new insights too, not just the headaches of simply not knowing how to handle all the tribbles.
The Economist article discusses some of the many ways that business innovators are teasing out the insights. Google seems to be the corporate leader. The positive side in the legal profession is that lawyers are teasing out of evidence that would otherwise never be found. In fact, the evidence would not even exist. The smoking gun emails found in many controversies today is a prime example.
In the past paper-world the private, informal, wish-I-hadn’t-said-that types of communications were never recorded. They were just sound waves or telephone calls. They were admissions that never became writings. They were near impossible to discover to impeach recalcitrant memories. They may have been suspected, but were easy to deny and near impossible to prove. The world has become far more literate and writing oriented in the last thirty years. People write more today than in the past, and write more informally and with less thought. According to The Economist:
The amount of reading people do, previously in decline because of television, has almost tripled since 1980, thanks to all that text on the Internet.
The wish-I-hadn’t-said-that types of e-communications are some of the new insights that lawyers can tease out of Big Data. They are insights into previously secret, private communications. This is discovery into true intentions. The new insights of e-discovery reveal what people were really saying and doing in the past by mining their emails, IMs, text messages, Facebooks, twitters, etc.
Electronic records create a history that never existed before. The relevant histories can sometimes be very hard to find because there are so many of them. But for most moderns today, the histories are there for Googlesque lawyers to find. If you know where and how to look, it is really no tribble at all.