We live in a world of explosions, from nuclear bombs to improvised explosive devices (IED). The IEDs at the side of the road are less dramatic and sophisticated than nuclear weapons, but deadly just the same. In 2009, there were 7,228 IED attacks in Afghanistan alone. By 2010 IED attacks in Afghanistan had killed 268 U.S. soldiers and wounded 3,366. On this Memorial Day 2011, we salute our military who face them bravely and pay tribute to those who gave their lives in service. The IED explosions are, thankfully, remote for most of us.
But one explosion, an abstraction really, and seemingly harmless by comparison, is felt by everyone everyday, the Information Explosion. The impact of this explosion on the law has not been good. Yet the article by Maura R. Grossman and Gordon V. Cormack discussed in this blog gives us reason to hope that the legal profession may yet survive the blast. Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Richmond Journal of Law and Technology, Vol. XVII, Issue 3, Article 11 (2011).
Wikipedia defines explosion as a rapid increase in volume and release of energy in an extreme manner. The volume of information in the world, especially the volume of electronically stored information (ESI) on our computers, has rapidly increased. Of that, there can be no doubt. See eg. my video with Jason R. Baron: e-Discovery: Did you Know?, or see Cisco’s article, The Explosion of Data, which claims that every five minutes, we create an explosion of digital data equivalent to all of the information stored in the Library of Congress. Or listen to Eric Schmidt, former CEO of Google, who claims that we now create as much information in two days as we did from the dawn of man through 2003.
But what of the rapid increase in release of energy in an extreme manner characteristic of an explosion? That is the bad, damaging part of explosions, where the release of too much energy too fast creates a fireball of destruction. With IEDs, the harm is obvious. The explosions cause indiscriminate death and injury. But what harm does an information explosion cause? What harm is there in the release of too much information, too fast? What is the harm of information overload?
Alvin Toffler is a futurist who saw this coming back in the sixties. He popularized the term information overload in his bestselling book Future Shock. Future shock was defined by Toffler as “the shattering stress and disorientation that we induce in individuals by subjecting them to too much change in too short a time.” Information overload was a related kind of malady, a psychological disorder where a person has difficulty understanding an issue and making decisions because they have too much information (TMI). Toffler wrote:
When the individual is plunged into a fast and irregularly changing situation, or a novelty-loaded context … his predictive accuracy plummets. He can no longer make the reasonably correct assessments on which rational behavior is dependent.
Toffler, A., Future Shock, pp. 350-1 (1970). Toffler’s work was prophetic. He was speculating on what the future could bring. But the future is here now and we need speculate no longer. Just look around. Stress and disorientation to rapid changes is common, especially in the over 40 set. Information overload is common. Just think of the Microsoft’s television commercial promoting it’s search method over Google’s and the dangers of the “search overload syndrome.”
The insidious effects of too much information are becoming obvious. It overloads our ability to think, to understand, and to really learn new things, as opposed to just looking them up. The information flood is stressful, but not deep. It is superficiality incarnate. Those who drink at its spicket only seem learned. In fact, they know little and have trouble making decisions. The amount and variety of conflicting information bits makes decisions difficult.
Those of us suffering the explosive blast of the information explosion see way too much, but observe too little. Even Bin Laden could hide in plain view. We may know how to say all of the right words, but do we really understand what they mean? Is it all just teleprompter driven? Do you know what I mean?
Information overload makes it hard to decide. It makes it harder still to stick with your decisions. Harder to know if they were correct. Some information says yes. Some says no. What is true? What is right?
Information overload may make it hard to think for yourself at all. Who is in? Who is out? What should I buy? What to do? Who do I like and why? Now we all beseech the God Google for advice. It pays to listen to what its leaders say.
Google Worries: What do they know that we don’t?
Eric Schmidt was the grown-up in charge of Google from its early days until just this year. Google is the company that knows what we are searching for, and arguably, the company that profits the most from information overload. Forty years after Toffler, Schmidt describes his worries and views from the heights of Google:
I worry that the level of interrupt, the sort of overwhelming rapidity of information — and especially of stressful information — is in fact affecting cognition. It is in fact affecting deeper thinking. I still believe that sitting down and reading a book is the best way to really learn something. And I worry that we’re losing that. (March 6, 2009 interview)
The one thing that I do worry about is the question of ‘deep reading.’ As the world looks to these instantaneous devices … you spend less time reading all forms of literature, books, magazines and so forth. That probably has an effect on cognition, probably has an effect on reading.” (January 29, 2010 interview)
Are we reading quickie web pages and tweets now instead of books? Do we watch tv, listen to music, surf the web, or more likely, all of the above? Or do we do deep reading? What do we lose by this transition of media? Is this change in learning modalities inherently bad for our cognition as Schmidt suggests?
The merit is in the message, not the media. But see: McLuhen, M., The Medium Is The Message (1964). It depends primarily on the content itself, not the media per se. If the author has something valuable to say, and says it well, then I will go to that content no matter what the media. Paper book, or e-book, or web, or whatever. If the content is good, then it is all-good, despite what Eric said.
For most of us, it is both-and, not either-or. We can still spend long periods of time reading. We can focus our attention solely on the written word and the imagination and thoughts this provokes. We are capable of prolonged intellectual absorption, deep reading. We can also do quick-reading. We can also do deep browsing. We can keep the cognition going by engaging in all types of learning, book based, experiential, air-trained, etc. We had better be adept at all kinds of learning, for as Alvin Toffler said fifty years ago about the future that is now:
The illiterate of the 21st century
will not be those who cannot read and write,
but those who cannot learn, unlearn, and relearn.
You need to be able to concentrate on a problem for long periods of time for this kind of re-indoctrination of the mind to take place, this learning, unlearning and relearning. If we lose the ability to do that, we can not cope with the ever accelerating pace of change, cannot avoid the future shock.
The Real Worry
With due respect for the opinion of Google’s Eric Schmidt, the real worry or concern should be the difficulty of many people today, especially young people, to do anything in a deep, prolonged manner, not just read books. (Video games are an exception, but they rarely teach anything more than strategy and quick reflexes.) This malady is one of the insidious affects of information overload.
Too much information can have a negative impact on our ability to think, to understand, and to really learn new things, as opposed to just looking them up. The condition of “cognitive overload” results in diminished information retention ability. It can diminish our ability to connect remembrances to experiences stored in the long-term memory, leaving thoughts “thin and scattered.” Also see: The Shallows: What the Internet Is Doing to Our Brains by Nicholas Carr.
So, while you may think the Information Explosion is harmless, it is not, and, unlike IEDs, it affects virtually everyone in the world, soldier and civilian alike.
Ready to Think Deep Thoughts?
We all struggle with information overload each day. The best antidote seems to be to force yourself, at least from time to time, to read a book, the harder the material the better. Or, if you are struggling and want a real challenge, read a law review article. Here is the one to read now: Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Richmond Journal of Law and Technology, Vol. XVII, Issue 3, Artilce 11 (2011) by Maura R. Grossman and Gordon V. Cormack. This article is the perfect antidote for those of us in e-discovery because it offers hope to cope with overload by use of machines. Watson to the rescue!
I have talked about this article before in Ruminations on James Gleick’s New Book “The Information,” Chaos, Life and e-Discovery, but now I am going to dive deep and encourage you to follow, to see the details for yourself. So turn off the phone and devices. Fight information overload and clear your head with sustained concentration. Get ready to learn, unlearn, and relearn what you thought you knew about legal search.
Myth of Human Infallibility
The title of the article summarizes it contents: Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review.
The article begins with the myth of infallibility, that human review of large volumes of ESI for relevance and other legal classifications is more reliable than automated review. Maura R. Grossman, a well known e-discovery lawyer who also has a Ph.D. in Psychology, and Gordon V. Cormack, a distinguished computer science professor (shown right), make short work of the myth:
[T]he literature contains little scientific evidence to support or refute the notion that automated methods, while improving on the efficiency of manual review, yield inferior results. This work presents evidence supporting the contrary position: that a technology-assisted process, in which only a small fraction of the document collection is ever examined by humans, can yield higher recall and/or precision than an exhaustive manual review process, in which the entire document collection is examined and coded by humans.
A technology-assisted review process involves the interplay of humans and computers to identify the documents in a collection that are responsive to a production request, or to identify those documents that should be withheld on the basis of privilege. A human examines and codes only those documents that are identified by the computer – a tiny fraction of the entire collection. Using the results of this human review, the computer codes the remaining documents in the collection for responsiveness (or privilege). A technology-assisted review process may involve, in whole or in part, the use of one or more approaches including, but not limited to, keyword search, Boolean search, conceptual search, clustering, machine learning, relevance ranking, and sampling.
This insight shows law and lawyers a way out of the danger zone of the information explosion. Computers got us into the TMI mess, got us dizzy with information overload. Computers can get us out.
Scientific research into the efficacy of computer assisted review has been going on since 2006, with the most serious work done by the Text Retrieval Conference (“TREC”), sponsored by the National Institute of Standards and Technology (“NIST”). Grossman and Cormack have been involved in the TREC Legal Track for several years and are well qualified to interpret its meaning. The good news is that, according to Grossman and Cormack, the latest results from TREC prove that technology-assisted processes can achieve high levels of recall and precision, levels at least as good, and probably better, than human reviewers. In their words:
the levels of performance achieved by two technology-assisted processes exceed those that would have been achieved by the official TREC assessors – law students and lawyers employed by professional review companies – had they conducted a manual review of the entire document collection.
Id. at pg. 4.
Basic Problem in e-Discovery Review
Grossman and Cormack then go on to state the basic problem of e-discovery, an issue M.R. Grossman (shown right) knows well from her years in private practice.
One of the greatest challenges facing legal stakeholders is determining whether or not the cost and burden of identifying and producing electronically stored information (“ESI”) is commensurate with its importance in resolving the issues in dispute. In current practice, the problem of identifying responsive (or privileged) ESI, once it has been collected, is almost always addressed, at least in part, by a manual review process, the cost of which dominates the e-discovery process. A natural question to ask, then, is whether this manual review process is the most effective and efficient one for identifying and producing the ESI most likely to resolve a dispute.
The answer, the authors say, is no. Manual review is obviously not the most efficient, even with contract lawyers, but surprisingly, it is also not the most effective. Human review as the gold standard is a myth.
Human Reviewers Disagree On Relevance
The authors support this debunking of the gold standard on a number of grounds. The first is prior research that shows how often different human reviewers disagree about the relevance of the same documents. Apparently the relevance assessments of trained reviewers have an overlap of less than 50%; that is, when one reviewer or the other thinks a document is relevant, the other reviewer disagrees more than half the time.
Still, it’s hard to believe isn’t it? Yet their analysis of the data from a study by Roitblat, Kershaw and Oot showed only a 28% overlap, or the same relevancy determinations, between two teams of reviewers:
[T]he overlap between Teams A and B was 28.1%. These and other studies of overlap indicate that relevance is not a concept that can be applied consistently by independent assessors, even if the information need is specified by a production request and the assessors are lawyers.
This is consistent with the findings of other information scientists on TREC studies done outside of the legal context, including Ellen M. Voorhees:
The scores for the [secondary and tertiary] judgments imply a practical upper bound on retrieval system performance is 65% precision at 65% recall since that is the level at which humans agree with one another.
Ellen M. Voorhees, Variations in relevance judgments and the measurement of retrieval effectiveness, 36:5 Information Processing & Management 697, 701 (2000). This suggests that there is a limit imposed on automated precision and recall scoring, not by the software code or automated processes, but by the human judges.
To help bring this concept home Maura and Gord created a website where you can go do some TREC Legal Track review yourself. When you visit the website, you will be asked to review 10 real TREC documents (the RFP appears at the upper right corner) where the computer and human reviewers disagreed on the relevancy call. Take a look at the RFP and then make your relevancy call on these 10 documents. The website will then tell you whether you coded them more like the computer program did or more like the contract reviewer did. If you find the game addictive, you can play as many times as you’d like with 10 new randomly selected documents. (Now that is a new kind of immersion that no paper book can bring.)
Maura and Gord also debunked the myth by comparing the results of human reviewers at TREC, the volunteers who read each document, with the the teams who used manual and computer assisted search in iterative cycles. The TREC research showed that:
technology-assisted review can achieve at least as high recall as manual review, and higher precision, at a fraction of the review effort, and hence, a fraction of the cost.
Id. at 31.
In fact, according to the authors’ analysis of TREC 2009 data:
There is not one single measure for which manual review is significantly better than technology-assisted review.
The time savings from the efficiencies of computer-assisted review were substantial, although not exactly known, since all review teams did not track or report their hours. What we do know is that approximately fifty times fewer documents had to be read with computer-assisted methods. In other words, the computer assisted reviewers only had to read one document, for every 50 read by the pure manual reviewers. Bottom line, computer assisted review was faster, more precise, and with recall at least as good as manual review. Id. at 30.
What Search Methods Have Worked Best at TREC
The research is still inconclusive about what search methods work best. But in general, the scientists looking at this have found that methods combining human review with automated review worked best. One method that scored high on both precision and recall was used by the vendor, H5. In their words, their method:
… utilizes an iterative issue-focusing and data-focusing methodology that defines relevancy in detail for a brief discussion of issue- and data-focusing strategies . . .whereas many alternative approaches start with a small number . . . of keywords intended to be broad so as to capture a lot of relevant data (with the consequence of many false positives), H5’s approach is focused on developing in an automated or semi-automated fashion large numbers of deterministic queries that are very precise: each string may capture just a few documents, but nearly all documents so captured will be relevant; and all the strings together will capture most relevant documents in the collection.
Id. at pg. 20.
A team from the University of Waterloo also had good results with their methods and made full disclosure of exactly what they did. Id. at 21. Waterloo’s approach consisted of three phases that it called: (i) interactive search and judging, (ii) active learning, and (iii) recall estimation. In the first phase of interactive search human reviewers marked about 11,000 documents for relevance at a very fast speed of
3 documents per minute (180 files per hour). In the second stage of active learning, 50,000 documents were reviewed by human reviewers, but this time they were reviewed really fast. The reviewers attained a rate of 20 documents per minute, or to use the more familiar metric, 1,200 files per hour. That’s fast!
The end result of the three phases for the Waterloo team in TREC 2009, was that every document produced was reviewed by a human with computer assistance. Still, the number of documents reviewed was a small fraction of the entire document population (14,396 of 836,165 documents were reviewed, on average, per topic). Thus, even tough human review was still used on every document produced, it was still heavily guided by computer to get to that last review step, and the review itself was done very fast.
Problems With the Scientific Studies to Date of Legal Search
There are many problems with the scientific research done to date on legal search, even the research done at the TREC Legal Track. Grossman and Cormack point out some of the issues in their article. The first years of TREC had a poor database from the Tobacco Litigation. It had too many scanned documents filled with scanning errors. That was fixed with the move to the Enron database. It has fewer such mistakes, but is still flawed in its own way, including the fact that it is by now well known by the research community. I have personally seen segments of the public ENRON database so many times in software demos that I can now quickly find emails from a friend of mine that I know are in there.
The authors point out one limitation of TREC that I had not known about. The manual reviewers are not given a representative sample to review, instead the sample they test is preloaded with more than average relevant ESI. The deck is stacked. As a result the human reviewers might have done better than they should have. As the authors put it:
The manual review was conducted on a stratified sample containing a higher proportion of relevant documents than the collection as a whole. Statistical inference was used to evaluate the result of reviewing every document in the collection. Beyond the statistical uncertainty – which is quantified by the significance level P – there is uncertainty as to whether manual reviewers would have had the same error rate had they reviewed the entire collection. It is not unreasonable to think that, because the proportion of relevant documents would have been lower in the collection than it was in the sample, reviewer recall and precision might have been even lower, because reviewers would have tended to miss the needles in the haystacks due to fatigue, inattention, boredom, and related human factors. This sampling effect, combined with the greater guidance provided to the human assessors, may have resulted in an overestimate of the effectiveness of manual review, and thus, understated the results of this study.
Other flaws and issues they discuss with TREC show that scientific research of legal search is still in its early stages and has far to go. Other flaws were not discussed in the article, including what is to me the biggest, namely the lack of relevancy ranking. Documents are either relevant or not, whereas in practice, highly relevant documents are noted, and for good reason. They are far, far more important than documents of just average relevance. Thus any test that omits the reality of relevance weighting is of only limited value. If 1,000 barely relevant documents are not recalled in a search of of 10,000, that is not a big problem. But if just one or two smoking guns are not recalled, that is a real problem. Testing without relevancy ranking is of only limited value to practicing attorneys. That is why I was pleased to learn from Grossman and Cormack that the TREC 2010 and 2011 Legal Tracks both evaluate relevance ranking. The TREC 2010 preliminary results are available through a link in the TREC 2011 Legal Task Preliminary Guidelines. TREC 2011 is just getting started, for those who would like to get involved.
Here is how Grossman and Cormack wrap up their article, Technology-Assisted Review in E-Discovery:
Overall, the myth that exhaustive manual review is the most effective – and therefore, the most defensible – approach to document review is strongly refuted. Technology-assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort. Of course, not all technology assisted reviews (and not all manual reviews) are created equal. The particular processes found to be superior in this study are both interactive, employing a combination of computer and human input. While these processes require the review of orders of magnitude fewer documents than exhaustive manual review, neither entails the naïve application of technology absent human judgment. Future work may address which technology-assisted review process(es) will improve most on manual review, not whether technology-assisted review can improve on manual review.
To me this means there is hope. It means that we are already well on our way to inventing software and legal methods that mitigate the explosive, harmful impact of information overload on our legal systems.
The science of legal search is important, not just to other scientists, but also to attorneys, especially attorneys who specialize (or want to) in e-discovery. You need to learn what the scientific research shows and modify your practice accordingly. Law is a profession. It is an art and a craft, not a science. But when your art is grounded in science, it can improve. We need to learn from scientific experiments with search and retrieval what is effective and what is not.
But while we learn from science we should never lose sight that we do not search for mere information, we search for evidence. We search for truth with the aim of justice. We have the entire legal structure to contend with, we have time to record, bills to pay, and clients, judges and other lawyers to answer to. Law is an art, not a science.
Scientific experiments cannot provide specific answers for lawyers, but can provide general guidance. We now know, for instance, that iteration works. Moreover, according to Jason R. Baron, preliminary TREC studies also show that more than one consultation with opposing counsel is rarely helpful, but one is very helpful. We know from H5′s tests at TREC that precise search terms are good, but need to be plentiful. We know from Waterloo’s tests that very fast human review is possible with computer assistance and helps. Best of all, we now know that manual review alone is no cure-all, no gold standard. In fact, it is not very good. We still have a lot to learn from science and Grossman and Cormack’s Technology-Assisted Review in E-Discovery is a good start. Fight information overload, set aside an hour, and read it without interruption. You may even want to go crazy and read Toffler’s books too.
I leave you with a few famous quotes by Alvin Toffler (shown left). They are more like tweets really, so have no concern, they wont strain your already information overloaded mind, they might even help. Anyway, I liked them and hope you do too.
One of the definitions of sanity is the ability to tell real from unreal. Soon we’ll need a new definition.
You can use all the quantitative data you can get, but you still have to distrust it and use your own intelligence and judgment.