The Multi-Modal “Where’s Waldo?” Approach to Search and My Mock Debate with Jason Baron

At LegalTech New York a few weeks ago, Jason R. Baron and I staged a debate over search strategies and cooperation ethics. As you probably already know, Jason is the Director of Litigation for the National Archives and Records Administration and Co-Chair of The Sedona Conference Working Group on Electronic Document Retention and Production. Jason came up with a factual hypothetical for the debate involving a suit by the 3-D Start Up Corp. against Orange Corp. for copyright infringement, theft of trade secrets and breach of contract. I added an additional hypothetical of the metrics of the search qualities and projected costs. Under the hypothetical Jason and I were in charge of the e-discovery departments of competing law firms. Our mock argument was in the form of presentations to the general counsel of Orange to try to persuade her to retain our respective law firm, and not the other. By the way, can you find the lawyer Waldo in the picture?

Our firms had very different strategies for search and retrieval. Jason’s firm proposed the traditional keyword search approach. My firm proposed a more diverse and innovative approach, which Jason characterized as unproven and vulnerable to attack. My proposal, which I called multi-modal, used keyword searches along with a variety of other judgmental and concept type search methods.

Although the competing firm’s agreed on the desirability of strategic cooperation, they had sharply different proposals on how to cooperate. The different approaches raise ethical issues that, in real life, both Jason and I feel are important.

Jeane A. Thomas, Partner and Chair of the e-Discovery Information Management Group of Crowell & Moring played the role of the General Counsel of Orange listening to and questioning our alternative proposals. Judge Paul Grimm played the role of a judge commenting on the differing proposals for legal services.

Pro-Keyword Gotcha Type Cooperation Argument

Jason argued that if his law firm were selected to represent Orange, that he would cooperate with the plaintiff by agreeing to their demands to use the 150 keywords they specified for the search and retrieval of ESI with only minor amendments. Jason noted that although the search terms proposed were numerous and broad, it would still be advantageous to the client to accepts this search protocol. He recommended that Orange cooperate with the plaintiff on this point because their keywords did not include the secret code-name that some key Orange employees had used for the project: “AvatarApp.”

Jason noted, and I had to agree, that our mutual review of samples of Orange’s ESI made in advance of the meeting showed that the ESI containing the keyword “AvatarApp” was frequently detrimental to the defense of Orange, although not so severely as to be a smoking gun. Jason suggested that Orange agree to the plaintiff’s search demand in order to avoid producing ESI that would harm its defense. Jason argued that Orange should retain his firm, rather than mine, so that Orange could appear to be cooperative and at the same time avoid production of harmful information. This was an appealing type of “have your cake and eat it too” argument designed to get business for his law firm.

Jason’s argument at LegalTech for gotcha type cooperation is quite a stretch from his real beliefs. This was all just role playing for educational purposes. Still, it is true that Jason is a very competitive fellow, supposedly banned from family Monopoly games, and prone to one-up-manship in all areas, including cooperation. I for one would certainly not want to compete against Jason Baron in real-life for e-discovery search services.

Jason created this hypothetical to follow-up on the ethical problem of asymmetric knowledge, an issue that he first raised at the Mercer Ethics Conference in late 2008 (see page 866). His factual scenario for NY LegalTech continues and refines the theme of asymmetric knowledge, adding issues of real versus feigned cooperation. The scenario of a requesting party seeking to dictate the search terms is somewhat extreme, but not unheard of. Some will propose terms as part of the request, but actively seek input from the producing party. This hypothetical shows what can happen when the requesting party does not seek to make the responding party a partner in the request and place some, if not all, of the burden on the producing party to devise a reasonable search.

The ethical issues raised by the hypothetical implicate the professional duties of competence (ABA Model Rule of Professional Conduct 1.1), diligence (Rule 1.3), expediting litigation (Rule 3.2), candor (Rule 3.3), and fairness (Rule 3.4). Jason and I will further examine these e-discovery ethics issues at a symposium in Chicago at Northern Illinois University on April 16, 2010. Judge John Facciola and William Hamilton will also participate in the panel discussion.

Alternative Where’s Waldo? Multi-Modal Argument

While Jason got stuck at LegalTech arguing the old-line positions and wearing the black hat, I got to be the good-guy. I proposed a truly cooperative approach using more advanced search techniques, which I called the “Where’s Waldo?” Multi-Modal approach. I proposed that Orange use both keyword and other alternative search methods, including what is often called concept-search methods. I call that approach multi-modal because its essence is to use a variety of search methods, and not just rely on keyword search alone, or concepts search alone either. I call it a “Where’s Waldo?” type of multi-modal approach because I proposed that Orange control the search of its own ESI, and not allow the requesting party to dictate the search, and that Orange conduct the search in an impartial and transparent manner. This means they search to try to find the ball, not hide it, and look for all relevant ESI without regard to whether it is positive or negative. See my prior article Child’s Game of “Go Fish” is a Poor Model for e-Discovery Search where I explain the inherent defects in keyword search as it is now conducted by most law firms and the advantages of my proposed alternative Where’s Waldo? approach.

I argued that this controlled, full disclosure Where’s Waldo? type of approach was advantageous to Orange, even in this situation where the requesting party did not know of, and thus did not propose, use of the key key-word “AvatarApp.” I proposed to use the AvatarApp term in our search, and also use other language and patterns that we knew about, and the requestor did not, since it was our data and we could look at all of it and they could not. I tried to persuade the general counsel that it was in the best interests of Orange to do this, even though I had to concede that it would uncover and lead to the production of a horde of otherwise hidden, negative ESI.

Consistent with the Where’s Waldo? method, I suggested that we make full disclosure to the requesting party of our search methods, including the previously secret AvatarApp slang word, and we demonstrate how and why our search protocol was not only reasonable, but superior to theirs. In short, I proposed that we make our best efforts under budget constraints to find as much relevant evidence as possible, be it good, bad or indifferent. I suggested that it was a waste of time and money, and also invalid cooperation of dubious ethics, to slant the search so as to hide unfavorable ESI from the other side (except of course for privileged ESI). I questioned both the ethics and efficacy of Jason’s approach to accept the plaintiff’s uninformed, Go Fish guess-based keyword list as a way to hide unfavorable ESI. I tried to show that it was a flawed approach, certain only to waste money, and very unlikely to succeed in its dubious hide-the-ball goal.

Scientific Research Supports Multi-Modal

My idea of multi-modal is to create a recipe of search methods appropriate for the particular project. One project may rely heavily on keyword boolean search, with just a few alternatives, or maybe none at all. Another may rely heavily on linguistic analysis, or on new types of software, or other creative approaches. I had the pleasure of citing TREC Legal Trackresearch against Jason to support my argument that a multi-modal approach would be much more effective than simple keyword search alone. See: Jason Baron on Search – How Do You Find Anything When You Have a Billion Emails? TREC shows that a variety of approaches works best, and that for some projects and issues boolean keyword search alone is very effective, but for others it fails miserably, and only alternative concept type searches will work. This is shown in the TREC chart below summarizing findings:

The TREC Legal Track research also shows that keyword search alone (by which I always mean boolean type keyword search that uses connector logic) accounts for only 22% of the relevant documents found in a seven million document database. The alternative search methods employed found the other 78% of the relevant documents. This supports my multi-modal argument that the most effective search method for a particular project will often require concept and other search methods to supplement keyword search. A master carpenter uses a number of different tools for most projects and does not rely on his hammer alone.

Metric Analysis of Projected Search and Review Costs

I also argued that my multi-modal approach would save the client money, lots of money, and uncover more relevant documents in the process. I showed by sampling that use of the 150 keywords proposed by the requesting party would produce far too many documents to review, approximately one million computer files, most of which would be irrelevant. My more precise multi-modal approach would, I contended, generate only 500,000 files, 50% less that the 150 keyword approach. Jason’s firm challenged my sampling and projections, but still had to concede that my methods would likely generate only 750,000 files, 25% less than the 150 keyword approach. I had to concede that my multi-modal search, using both keyword and various concept searches,along with iterative sampling, would cost more to set-up and run than the simple keyword search. We agreed that the use of a multi-modal approach in this situation, with tens of millions of files to search and a large number of custodians, would costs $125,000 to perform, whereas the simple keyword search would only costs $25,000.

In spite of the $100,000 higher initial search costs, the imprecision inherent in the 150 guessed keyword approach generate too many false positives. The one million files generated would drive up the costs of final review and production far more than the initial savings. In fact, my metrics, which the competing law firm could not rebut, showed a total savings of from $550,000 to $1,200,000 by using the more precise multi-modal approach. This savings naturally flows from the fact that the greatest costs in e-discovery are in review and my approach resulted in 500,000 to 250,000 less files to review. For a full analysis of the costs of review metrics see my previously referenced metrics hypothetical. For an overview see the video below of my three-slide presentation at LegalTech.

The gotcha pseudo-cooperation approach of using the requesting party’s 150 guessed keywords might, and I emphasize might, succeed in hiding some of the bad documents. But my sampling and projected estimates of the cost to review showed that any such advantage would come at too high a cost. The metrics showed that the informed multi-modal approach would save Orange over a million dollars in e-discovery review costs. My presentation to get this case appealed to both the client’s sense of ethics and pecuniary interests. The prospect of doing the right thing, and saving a million dollars, makes for a compelling argument, although the play-role client here, Jeane Thomas, never told us her decision.

She did, however, ask both of our firms to tell her what the savings would be to Orange to limit the final human review to a privilege review only. In this scenario Orange would use a kind of Quick Peek agreement, strengthened by a Rule 502 Order and an attorneys eyes only Confidentiality Order. They would not do a relevancy review, nor confidentiality review and redaction. The privilege review itself would be partially automated with confirmation by expert human reviewers. Some privileged documents would certainly be produced, but Orange would be protected from waiver by court orders and agreement. The amount of money saved by drastically reducing human review in this way was staggering. The last slide in the above movie sets this out. By limiting the expense of human reviewers to partial privilege review and logging, you save $2,000,000 under the keyword approach, and from $1,000,000 to $1,500,000 using multi-modal. The multi-modal approach was still overall less expensive than keyword, saving between $150,000 and $400,000.

My appeal to metrics and cost analysis to counter Jason’s hide-the-ball arguments was, like the citation to TREC Legal Track, using Jason’s own petard against him. He was a good sport to set himself up in that way. Jason is the Editor in Chief of one of the lead articles on metrics, The Sedona Conference Commentary on Achieving Quality in the E-Discovery Process (2009). See my prior article on this important work: Sedona on Quality: a Must-Read Commentary. Again, let me stress that Jason was just arguing a position here, and the mock argument obviously does not reflect his personal views, which are certainly not of the hide-the-ball variety. Indeed, the next time we do this particular educational skit, I may lose the coin toss. Then I will have to argue for keyword search and feigned cooperation and Jason will wear the white hat and argue for innovative search and bona fide cooperation.

Strategic Cooperation

In our mock argument Jason told the general counsel of Orange that his approach would appear to be very cooperative, since it involved acceptance of the plaintiff’s search strategy, whereas mine would not. Jason argued that the side that appears to be most cooperative will have a strategic advantage in the case, especially with the supervising judge. I agreed with the later point, but disagreed with the rest. I argued that the approach recommended by Jason’s firm would not fool anyone for long, including the judge. My firm’s approach to cooperation was genuine. Although the refusal to accept the other side’s 150 guessed keywords might appear uncooperative at first, over time it would become obvious that it was driven by a desire for true cooperation. It was driven by the desire to get at as much of the truth as possible under the constraints of time and money placed by this case. My approach would fulfill the ethical duties of candor to the court and fairness to the opposing party and counsel. We would do so by voluntarily disclosing the secret AvatarApp word to the other side at the initial 26(f) conference. This would also fulfill our ethical duty under Rule 3.2 to expedite litigation, not to mention the prime directive of the Federal Rules of Civil Procedure, Rule 1, which calls for the just, speedy and inexpensive resolution of every case.

I argued that the deception of only using the requesting party’s keywords, which did not include the all important AvatarApp keyword, would eventually be uncovered. Some of the documents using the code name AvatarApp would likely turn up because they also contained one or more of the plaintiff’s key words. The plaintiff and its attorneys would then know that they had been had. They would discover that the supposed cooperation of the defense was all along nothing but a trap to allow the plaintiff and its legal counsel to be hoisted by their own petards. In this case the petard, the grenade, was the uninformed arrogance of plaintiff’s counsel to think they knew enough to dictate search terms. Some might think, serves them right, and indeed it would. But what is the result of this feigned cooperation, this clever discovery gamesmanship? How does plaintiffs counsel then react? They counter-attack, and this time with great fury.

After a maneuver like that, there is a complete lack of trust and a flurry of expensive motion practice ensues. The plaintiff would argue bad faith and false cooperation by the part of Orange and its counsel. They would argue fraudulent concealment, that Orange had a duty to disclose the keyword AvatarApp, but did not. They would move for sanctions and to compel another search using the AvatarApp keyword. They would demand a do-over at Orange’s expense. They might persuade the judge and win on some points. The court could order an expensive do-over. Sanctions might even be imposed.

Jason’s firm objected to this part of my argument, and asserted that the plaintiffs would again lose this battle, since we merely followed their demands. I conceded that a do-over was by no means certain. There is authority and logic behind applying the doctrine of estoppel against the plaintiff in this situation. Also, the doctrine of invited error could apply. The plaintiff got what it asked for and then stipulated to. Jason’s clever hide the ball petard trap strategy could succeed, depending on the quality of opposing counsel and the judge. On the other hand, a judge ruling on this issue could well make the producing party, Orange, pay for at least some of the cost of a second search and production. In any event, the litigation costs over the issue would certainly be expensive. I argued that this tactic was ethically questionable, unlikely to prevent the discovery of harmful evidence, and very likely to inflate litigation costs.

How Will the Courts and Clients React?

When we were finished with the competing proposals and arguments, it was then Judge Grimm’s turn to comment. Personally, I had hoped he would indicate how he would rule in such a scenario. Judge Grimm was, however, cautious. He did not indicate how he would rule on this sticky issue of a do-over and who should pay for it. Still, having read Judge Grimm’s many legal opinions and writings over the years, I am confident that if this scenario was presented to him in his courtroom, he would rule against the producing party that snookered the requesting party.

Judge Grimm understands the limits of keyword searches and the ethical duties of cooperation, competence, diligence, expeditiousness, fairness and candor. The feigned cooperation maneuver that Jason’s hypothetical raises is not likely to succeed in front of a sophisticated judge who is knowledgeable in the ways of e-discovery. Still, there is a dearth of e-discovery expert judges, and no legal authority on this issue at this time, nor is such authority likely to come soon. For these reasons, the argument of feigned strategic cooperation will be quite tempting to many attorneys for many years to come.

It will be especially tempting to litigants trying to decide which law firm to retain to help win an emotional or high stakes “bet-the-company” type of law suit. That is why I found Jason’s hypothetical and mock-debate to be especially interesting and important to the profession. Let us hope that litigants will have the opportunity to hear both sides of the argument and the courage and financial sense to make the right decision. I am concerned that only the tough-guy voices of feigned cooperation will be heard, at least at first. I am concerned that clients will not hear the more restrained voices of cooperation, and even if they do, they will not understand the financial savings that these innovative approaches can make possible.

Rulings by our leading judges should help get this message across to litigants so that the quiet voices of reason by competing outside counsel can be heard. Strong opinions on sanctions motions can, and I am confident will, send clear messages to litigants and their attorneys. These rulings already have and will continue to encourage real cooperation and new and improved search and production methods.

Conclusion

The legal profession in the United States is now preoccupied with playing electronic Go Fish-like games of keyword guessing. We can continue this business as usual, and we can continue to over-review and over-produce unwanted mountains of data. We can ignore the cooperative Where’s Waldo multi-modal approaches. But should we? Can our clients, and society as a whole, afford to continue the old ways of hide-the-ball discovery gamesmanship? I think not. It too expensive and it’s morally bankrupt.

We should not give up the Twentieth Century American tradition of discovery, especially e-discovery, as some contend. We should, however, change our attitudes and move to a cooperative model of discovery. When it comes to e-discovery in particular, we should move to a smarter more high-tech oriented model. Keyword search is so last century. The scientific research is in and it shows that keyword search alone usually does not work. It just produces inefficient searches and giant haystacks of irrelevant data that are incredibly expensive to review.

The research shows that we have to employ new, alternative methods that vary according to the needs of the particular case. We should embrace concept search inclusive, multi-modal, Where’s Waldo approaches to e-discovery. We should not just walk away from e-discovery entirely, as most attorneys today are still doing. Electronic discovery is too expensive now, and something to be avoided, because it is usually carried out under the old paper discovery model of gamesmanship and “any and all” productions. The truth is, most litigation attorneys lack the technical competence and attitude needed for e-discovery.

Electronic discovery is too expensive at this point because the profession has been unable to change its ways fast enough to keep up with the mind-boggling advances of technology. See eg.: e-Discovery: Did You Know? We have to pick up the pace and become comfortable with the new technologies. We have to understand that in today’s world of terabytes and exabytes of ESI, no one can afford the whole truth. We have to reign in document reviews with proportionality. That has to start with smarter search that generates smaller document sets. The multi-modal cooperative Waldo approach that I argued at LegalTech is one way to get there.

I do not contend that it is the only way to get there. You can, for instance, have a multi-modal approach that is not also Where’s Waldo. You could, for instance, allow the requester to be an equal partner in designing search protocols. You could engage in an iterative series of negotiated multi-modal sessions where the parties meet to come up with the best search recipe for the case. This would, however, necessarily entail the transparency aspects of the Where’s Waldo approach. The producing party would test and report back in order to make these meetings meaningful. A series of Go Fish guessing games, with no tests and sampling in between would, I contend, be a big waste of time. Even with tests, sampling and transparency, the TREC research shows rapidly diminishing returns after the first two meetings.

It seems obvious to me that the best role of the requesting party is to specify what they are looking for, what does the Waldo they want look like? The Where’s Waldo driven counsel meetings would focus on the requester explaining what they are looking for, narrowing the request and making it more specific. The search design would be controlled by the producing party. The requesting party is in no position to design a search of data that they have not seen, and can never see. More than one meeting may still be necessary under Waldo, as the producing party will need to report back, explain the search they have used and why, and hopefully get buy in from the requester. The requesting party may even have some good search suggestions from time to time, and I am not saying they should not be heard by the producing party. I am just saying they should not dictate or control the search.

New search methods and cooperative attitudes are the best way out of the e-discovery morass we are now in, not rewriting the rules once again. The rules are pretty much fine as they are (although I would make a Rule 16(b) hearing mandatory). We do not need to abandon discovery or dramatically change the rules of the game. We need to improve our game skills and attitude. We need to think different and to cooperate. We need to channel our adversarial skills and arguments to the meaning of the law and the facts, not the hiding of facts. The desire of many trial lawyers today to control the facts, and rewrite history, so that they can win a case is misguided. This is what is ruining litigation today, not discovery or e-discovery per se.

Electronic discovery is over expensive today because it is driven by this type of misplaced adversarial attitude, compounded by a lack of competence and over-reliance on vendors. Vendors have their place, and are often a key part of a good e-discovery team, but they are not lawyers and should never be in charge of e-discovery. Most of them profit from keyword search models of over retrieval and review. So too do many law firms with their armies of reviewers.

The over-review models that dominate e-discovery today are doomed. The future belongs instead to a cooperative, ambidextrous, concept laden fellow named Waldo. The problem is, at this still early stage of the game, he can be awfully hard to find.

This entry was posted on Saturday, February 27th, 2010 at 4:27 pm and is filed under Lawyers Duties, Review, Search. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

6 Responses to The Multi-Modal “Where’s Waldo?” Approach to Search and My Mock Debate with Jason Baron

Video Supplement to the LegalTech Debate on Keyword Search v. Multimodal Search « e-Discovery Team says:

March 7, 2010 at 12:39 pm

[…] view with a high-speed connection in high definition (HD), full screen mode. This video supplements last week’s blog on the keyword search debate at LegalTech. It is a video excerpt of a law school class where I […]

Loading...

Reply
The Trials and Tribble-ations of the Data Deluge « e-Discovery Team says:

March 14, 2010 at 8:18 pm

[…] Where indeed? That is the golden question today in e-discovery. Where is the highly relevant evidence and how do we find it? In my view the answer lies in a multimodal Where’s Waldo? approach, one that is bottom line driven based on cost and Rule 26(b)(2)(C). But that’s another story. […]

Loading...

Reply
Survey of 103 e-Discovery Cases in the First Half of 2010, the “Campbell Soup” case, and the Wisdom of Andy Warhol « e-Discovery Team says:

July 18, 2010 at 5:18 pm

[…] Go Fish approach is a model of inefficiency. See: The Multi-Modal “Where’s Waldo?” Approach to Search and My Mock Debate with Jason Baron, and Child’s Game of “Go Fish” is a Poor Model for e-Discovery Search. The actions of […]

Loading...

Reply
Geoff Black's Forensic Gremlins » Blog Archive » eDiscovery Review and Predictive Coding with Statistics - Everything that gives you fits in Digital Forensics and E-Discovery says:

October 14, 2010 at 11:55 pm

[…] also in time [not] wasted in the courtroom. Jason R. Baron and Ralph Losey literally talk about it all the […]

Loading...

Reply
New Methods for Legal Search and Review « e-Discovery Team ® says:

March 18, 2012 at 7:14 pm

[…] work far better. See: Child’s Game of “Go Fish” is a Poor Model for e-Discovery Search; The Multi-Modal “Where’s Waldo?” Approach to Search and My Mock Debate with Jason Baron; Secrets of Search: Parts One, Two, and […]

Loading...

Reply
Changing the Conversation in eDiscovery: Where Do We Go From Here? says:

November 20, 2012 at 10:42 am

[…] and there are very few surprises left to be hashed out. And while interesting, debating whether a “hybrid multi-modal” predictive approach is superior to a pure Latent Semantic approach is precisely the phenomena […]

Loading...

Reply