“The Hacker Way” – What the e-Discovery Industry Can Learn From Facebook’s Management Ethic

August 18, 2013

Facebook’s regulatory filing for its initial public stock offering included a letter to potential investors by 27 year old billionaire Mark Zuckerberg. The letter describes the culture and approach to management that he follows as CEO of Facebook. Zuckerberg calls it the Hacker Way. Mark did not invent this culture. In a way, it invented him. It molded him and made him and Facebook what they are today. This letter reveals the secrets of Mark’s success and establishes him as the current child prodigy of the Hacker Way.

Too bad most of the CEOs in the e-discovery industry have not read the letter, much less understand how Facebook operates. They are clueless about the management ethic it takes to run a high-tech company.

An editorial in Law Technology News explains why I think most of the CEOs in the e-discovery software industry are just empty suits. They do not understand modern software culture. They think the Hacker Way is a security threat. They are incapable of creating insanely great software. They cannot lead with the kind of inspired genius that the legal profession now desperately needs from its software vendors to survive the data deluge. From what I have seen most of the pointy-haired management types that now run e-discovery software companies should be thrown out. They should be replaced with Hacker savvy management before their once proud companies go the way of the Blackberry. The LTN article has more details on the slackers in silk suits. Vendor CEOs: Stop Being Empty Suits & Embrace the Hacker Way. This essay, a partial rerun from a prior blog, gives you the background on Facebook’s Hacker Way.

Hacker History

The Hacker Way tradition and way of thinking has been around since at least the sixties. It has little or nothing to do with illegal computer intrusions. Moreover, to be clear, NSA leaker Edward Snowden is no hacker. All he did was steal classified information, put it on a thumb drive, meet the press, and then flea the country, to communist dictatorships no less. That has nothing to do with the Hacker Way and everything to do with politics.

The Hacker Way – often called the hacker ethic – has nothing to do with politics. It did not develop in government like the Internet did, but in the hobby of model railroad building and MIT computer labs. This philosophy is well-known and has influenced many in the tech world, including the great Steve Jobs (who never fully embraced its openness doctrines), and Steve’s hacker friend, Steve Wozniak, the laughing Yoda of the Hacker Way. The Hacker approach is primarily known to software coders, but can apply to all kinds of work. Even a few lawyers know about the hacker work ethic and have been influenced by it.

Who is Mark Zuckerberg?

We have all seen a movie version of Mark Zuckerberg in The Social Network, who, by the way, will still own 56.9% voting control of Facebook after the public offering later this year. But who is Mark Zuckerberg really? His Facebook page may reveal some of his personal life and ideas, but how did he create a Hundred Billion Dollar company so fast?

How did he change the world at such a young age? There are now over 850 million people on Facebook with over 100 billion connections. On any one day there are over 500 million people using Facebook. These are astonishing numbers. How did this kind of creative innovation and success come about? What drove Mark and his hacker friends to labor so long, and so well? The letter to investors that Mark published  gives us a glimpse into the answer and a glimpse into the real Mark Zuckerberg. Do I have your full attention yet?

The Hacker Way philosophy described in the investor letter explains the methods used by Mark Zuckerberg’s and his team to change the world. Regardless of who Mark really is, greedy guy or saint (or like Steve Jobs, perhaps a strange combination of both), Mark’s stated philosophy is very interesting. It has applications to anyone who wants to change the world, including those of us trying to change the law and e-discovery.

Hacker Culture and Management

Mark’s letter to investors explains the unique culture and approach to management inherent in the Hacker Way that he and Facebook have adopted.

As part of building a strong company, we work hard at making Facebook the best place for great people to have a big impact on the world and learn from other great people. We have cultivated a unique culture and management approach that we call the Hacker Way.

The word `hacker’ has an unfairly negative connotation from being portrayed in the media as people who break into computers. In reality, hacking just means building something quickly or testing the boundaries of what can be done. Like most things, it can be used for good or bad, but the vast majority of hackers I’ve met tend to be idealistic people who want to have a positive impact on the world.

The Hacker Way is an approach to building that involves continuous improvement and iteration. Hackers believe that something can always be better, and that nothing is ever complete. They just have to go fix it — often in the face of people who say it’s impossible or are content with the status quo.

Hackers try to build the best services over the long term by quickly releasing and learning from smaller iterations rather than trying to get everything right all at once. To support this, we have built a testing framework that at any given time can try out thousands of versions of Facebook. We have the words `Done is better than perfect’ painted on our walls to remind ourselves to always keep shipping.

Hacking is also an inherently hands-on and active discipline. Instead of debating for days whether a new idea is possible or what the best way to build something is, hackers would rather just prototype something and see what works. There’s a hacker mantra that you’ll hear a lot around Facebook offices: `Code wins arguments.’

Hacker culture is also extremely open and meritocratic. Hackers believe that the best idea and implementation should always win — not the person who is best at lobbying for an idea or the person who manages the most people.

To encourage this approach, every few months we have a hackathon, where everyone builds prototypes for new ideas they have. At the end, the whole team gets together and looks at everything that has been built. Many of our most successful products came out of hackathons, including Timeline, chat, video, our mobile development framework and some of our most important infrastructure like the HipHop compiler.

To make sure all our engineers share this approach, we require all new engineers — even managers whose primary job will not be to write code — to go through a program called Bootcamp where they learn our codebase, our tools and our approach. There are a lot of folks in the industry who manage engineers and don’t want to code themselves, but the type of hands-on people we’re looking for are willing and able to go through Bootcamp.

So sayst Zuckerberg. Hands-on is the way.

Application of the Hacker Way to e-Discovery

E-discovery needs that same hands-on approach. E-discovery lawyers need to go through bootcamp too, even if they primarily just supervise others. Even senior partners should go, at least if they purport to manage and direct e-discovery work. Partners should, for example, know how to use the search and review software themselves, and from time to time, do it, not just direct junior partners, associates, and contact lawyers. You cannot manage others at a job unless you can actually do the job yourself. That is the hacker key to successful management.

Also, as I often say, to be a good e-discovery lawyer, you have to get your hands dirty in the digital mud. Look at the documents, don’t just theorize about them or what might be relevant. Bring it all down to earth. Test your keywords, don’t just negotiate them. Prove your search concept by the metrics of the search results. See what works. When it doesn’t, change the approach and try again. Plus, in the new paradigm of predictive coding, where keywords are just a start, the SMEs must get their hand dirty. They must use the software to train the machine. That is how the artificial intelligence aspects of predictive coding work. The days of hands-off theorists is over. Predictive coding work is the penultimate example of code wins arguments.

Iteration is king of ESI search and production. Phased production is the only way to do e-discovery productions. There is no one final, perfect production of ESI. As Voltaire said, perfect is the enemy of  good. For e-discovery to work properly it must be hacked. It needs lawyer hackers. It needs SMEs that can train the machine on what is relevant, on what evidence must be found to do justice. Are you up to the challenge?

Mark’s Explanation to Investors of the Hacker Way of Management

Mark goes on to explain in his letter to investors how the Hacker Way translates into the core values for Facebook management.

The examples above all relate to engineering, but we have distilled these principles into five core values for how we run Facebook:

Focus on Impact

If we want to have the biggest impact, the best way to do this is to make sure we always focus on solving the most important problems. It sounds simple, but we think most companies do this poorly and waste a lot of time. We expect everyone at Facebook to be good at finding the biggest problems to work on.

Move Fast

Moving fast enables us to build more things and learn faster. However, as most companies grow, they slow down too much because they’re more afraid of making mistakes than they are of losing opportunities by moving too slowly. We have a saying: “Move fast and break things.” The idea is that if you never break anything, you’re probably not moving fast enough.

Be Bold

Building great things means taking risks. This can be scary and prevents most companies from doing the bold things they should. However, in a world that’s changing so quickly, you’re guaranteed to fail if you don’t take any risks. We have another saying: “The riskiest thing is to take no risks.” We encourage everyone to make bold decisions, even if that means being wrong some of the time.

Be Open

We believe that a more open world is a better world because people with more information can make better decisions and have a greater impact. That goes for running our company as well. We work hard to make sure everyone at Facebook has access to as much information as possible about every part of the company so they can make the best decisions and have the greatest impact.

Build Social Value

Once again, Facebook exists to make the world more open and connected, and not just to build a company. We expect everyone at Facebook to focus every day on how to build real value for the world in everything they do.

________

Applying the Hacker Way of Management to e-Discovery

Hacker_pentagram

Focus on Impact

Law firms, corporate law departments, and vendors need to focus on solving the most important problems, the high costs of e-discovery and the lack of skills. The cost problem primarily arises from review expenses, so focus on that. The way to have the biggest impact here is to solve the needle in the haystack problem. Costs can be dramatically reduced by improving search. In that way we can focus and limit our review to the most important documents. This incorporates the search principles of Relevant Is Irrelevant and 7±2 that I addressed in Secrets of Search, Part III. My own work has been driven by this hacker focus on impact and led to my development of Bottom Line Driven Proportional Review and multimodal predictive coding search methods. Other hacker oriented lawyers and technologists have developed their own methods to give clients the most bang for their buck.

The other big problem in e-discovery is that most lawyers do not know how to do it, and so they avoid it altogether. This in turn drives up the costs for everyone because it means the vendors cannot yet realize large economies of scale. Again, many lawyers and vendors understand that lack of education and skill sets is a key problem and are focusing on it.

Move Fast

This is an especially challenging dictate for lawyers and law firms because they are overly fearful of making mistakes, of breaking things as Facebook puts it. They are afraid of looking bad and malpractice suits. But the truth is, professional malpractice suits are very rare in litigation. Such suits happen much more often in other areas of the law, like estates and trusts, property, and tax. As far as looking bad goes, they should be more afraid of the bad publicity from not moving fast enough, which is a much more common problem, one that we see daily in sanctions cases. Society is changing fast, if you aren’t too, you’re falling behind.

The problem of slow adoptions also afflicts the bigger e-discovery vendors who often drown in bureaucracy and are afraid to make big decisions. That is why you see individuals like me starting an online education program, while the big boys keep on debating. I have already changed my e-Discovery Team Training program six times since it went public almost two years ago. `Code wins arguments.’ Lawyers must be especially careful of the thinking Man’s disease, paralysis by analysis, if they want to remain competitive.

A few lawyers and e-discovery vendors understand this hacker maxim and do move fast. A few vendors appreciate the value of getting there first, but fewer law firms do. It seems hard for most of law firm management to understand that the risks of lost opportunities are far more dangerous and certain than the risks of a making a few mistakes along the way. The slower, too conservative law firms are already starting to see their clients move business to the innovators, the few law firms who are moving fast. These firms have more than just puffed-up websites claiming e-discovery expertise, they have dedicated specialists and, in e-discovery at least, they are now far ahead of the rest of the crowd. Will the slow and timid ever catch up, or will they simply dissolve like Heller Ehrman, LLP?

Be Bold

This is all about taking risks and believing in your visions. It is directly related to moving fast and embracing change; not for its own sake, but to benefit your clients. Good lawyers are experts in risk analysis. There is no such thing as zero-risk, but there is certainly a point of diminishing returns for every litigation activity that is designed to control risks. Good lawyers know when enough is enough and constantly consult with their clients on cost benefit analysis. Should we take more depositions? Should we do another round of document checks for privilege? Often lawyers err on the side of caution, without consulting with their clients on the costs involved. They follow an overly cautious approach wherein the lawyers profit by more fees. Who are they really serving when they do that?

The adoption of predictive coding provides a perfect example of how some firms and vendors understand technology and are bold, and others do not and are timid. The legal profession is like any other industry, it rewards the bold, the innovators who create new legal methods and law for the benefit of their clients. What client wants a wimpy lawyer who is over-cautious and just runs up bills? They want a bold lawyer, who at the same time remains reasonable, and involves them in the key risk-reward decisions inherent in any e-discovery project.

Be Open

In the world of e-discovery this is all about transparency and strategic lowering of the wall of work product. Transparency is a proven method for building trust in discovery. Select disclosure is what cooperation looks like. It is what is supposed to happen at Rule 26(f) conferences, but seldom does. The attorneys that use openness as a tool are saving their clients needless expense and disputes. They are protecting them from dreaded redos, where a judge finds that you did a review wrong and requires you to do it again, usually under very short timelines. There are limits to openness of course, and lawyers have an inviolate duty to preserve their client’s secrets. But that still leaves room for disclosure of information on your own methods of search and review when doing so will serve your client’s interests.

Build Social Value 

The law is not a business. It is a profession. Lawyers and law firms exist to do justice. That is their social value. We should never lose sight of that in our day-to-day work. Vendors who serve the legal profession must also support these lofty goals in order to provide value. In e-discovery we should serve the prime directive, the dictates of Rule 1, for just, speedy, and inexpensive litigation. We should focus on legal services that provide that kind of social value. Profits to the firm should be secondary. As Zuckerberg said in the letter to potential investors:

Simply put: we don’t build services to make money; we make money to build better services.

This social value model is not naive, it works. It eventually creates huge financial rewards, as a number of e-discovery vendors and law firms are starting to realize. But that should never be the main point.

Conclusion

Facebook and Mark Zuckerberg should serve as an example to everyone, including e-discovery lawyers and vendors. I admit it is odd that we should have to turn to our youth for management guidance, but you cannot argue with success. We should study Zuckerberg’s 21st Century management style and Hacker Way philosophy. We can learn from its tremendous success. Zuckerberg and Facebook have proven that these management principles work in the digital age. It is true if it works. That is the pragmatic tradition of American philosophy. We live in fast changing times. Embrace change that works. As the face of Facebook says: “The riskiest thing is to take no risks.”


Why a Receiving Party Would Want to Use Predictive Coding?

August 12, 2013

Predictive coding software is not just a game-changer for producing parties, it is invaluable for receiving parties as well, especially those faced with document dumps. Good predictive coding software ranks all documents in a collection according to the attorney trainer’s conception of relevance (or responsiveness). The software then orders all of the documents, from the most important, to the least relevant. The document ranking feature thereby empowers a receiving party to cull out the marginally irrelevant, or totally irrelevant documents that often clutter document productions. The receiving party can then review only the documents that it thinks are of the most importance to the case, the documents they want, and ignore the rest. That saves valuable time and effort, and transforms an ugly, imprecise, document dump into a delicious, high-tech feast of low hanging fruit.

Low_hanging_fruit

Document Dumps

Document dumps are sometimes an intentional bad faith tactic by a producing party designed to hide evidence or overwhelm the requesting party, but are usually inadvertent or the result of carelessness. See eg Gottlieb v. Iskowitz, 2012 WL 2337290 (Cal. Ct. App. June 20, 2012) (default judgment entered as sanction for intentional, bad faith document dump in violation of court order); Branhaven LLC v. Beeftek, Inc., _F.R.D._, 2013 WL 388429 at *3 (D. Md. Jan. 4, 2013) (sanctions entered under Rule 26(g) for discovery abuses, including a document dump); Losey, R. The Increasing Importance of Rule 26(g) to Control e-Discovery Abuses; Denny & Cochran, The ESI Document Dump in White Collar CasesIn re Fontainebleau Las Vegas, 2011 U.S. Dist. LEXIS 4105 (S.D. Fla. 2011); Fisher-Price, Inc. v. Kids II, Inc., 2011 WL 6409665 (W.D.N.Y. Aug. 10, 2012); Rajala v. McGuire Woods, LLP, 2013 WL 50200 (D. Kan. Jan. 3, 2013) (no sanctions entered and clawback enforced because the judge saw no evidence that plaintiff’s counsel, in producing the privileged material, “intended to overwhelm or burden the receiving party [with] documents largely irrelevant to the litigation.”)

I would like to think the intentional bad faith actions are rare. That has been my experience. What I usually see are over-productions arising out of concern of missing relevant documents. The producing party does not want to face expensive motions to compel, or worse, motions for sanctions for withholding relevant documents, as seen for instance in several of the cases cited above.

Other producing parties make document dumps because they simply do not know any better. They do not have predictive coding software and are relying on imprecise search methods, like keyword or human review. Alternatively, they have predictive coding type software, but they licensed the wrong kind of software (hey, it was cheaper) and it was not any good. Sometimes they chose good software, but do not know how to use it properly. Most vendors have very poor education systems and few are willing to tell their clients, who are often hard-nosed lawyers, that they are not using it right. The reality is, despite the ads, there is no easy button on good search. Just the contrary. Other times, a lawyer will have the right tools, and the skills to use them, but not the time to do the job right. They have a deadline to meet and do not have time to make a more precise production. They would rather err on the side of recall and full production.

For all of these reasons, and more, a producing party will often err on the side of over-production, either on purpose or due to negligence or inadequate time. Sometimes even when the producing party does a perfect job, there will still be too many documents for a receiving party to review them all within the time and money constraints of the case. Is this still a document dump? Not really, but it is still too-much-information, more than the receiving party needs.

Whatever the cause, in many cases, especially large cases, the receiving party ends up with a haystack of documents that effectively hides the information they need to prepare for trial. The receiving party then has essentially the same problem the producing party had, too-much-information, although on a lesser scale. Instead of making a fuss, and engaging in often futile motion practice, the smart receiving party will use predictive coding to sort the documents. They will then only review the documents that they want to see. That is the great strength and beauty of relevancy ranking, a feature that can only be found with predictive coding type software, as I will explain in greater detail in a future blog.

Relevant Is Irrelevant

Zen_KoanRemember my fourth secret of search, relevant is irrelevant? See Secrets of Search – Part III. This Zen koan means that  merely relevant documents are not important to a case. What really counts are the hot documents, the highly relevant. The others are nothing, just more of the same.

There can be hundreds of thousands of technically relevant documents in a collection, especially when an overly broad production request has been made, and complied with. The proper response to an over-broad request is, of course, an objection, dialogues with the requesting party, and failing resolution, a motion for protective order. But sometimes the court may get it wrong and order over-production, or the responding party might not care. Perhaps, for instance, the law firm profits from over-review. I have heard that still goes on. Or perhaps the responding party wanted to go ahead and produce everything requested, saving only privileged documents, for other reasons, such as, saving money by forgoing a careful review. Maybe they are making a document dump on purpose to hide the needles of hot documents in a haystack of merely relevant. Would it not be ethical to do so if the requesting part asked for this dump, perhaps even insisted on it? The requesting party is just getting that they asked for.

Conclusion

Whatever the reasons, sometimes the requesting party receives far too many documents, much more than they wanted. They are then in the position that most producing parties are in when they review their clients documents, they have too much information. That is why relevancy ranking ability is a great reason for a receiving party to use predictive coding software to review large document productions. Even if the production is not a dump at all, it is just large, the receiving party needs help from predictive coding software to go beyond the merely relevant to the highly relevant.


Comparative Efficacy of Two Predictive Coding Reviews of 699,082 Enron Documents

June 17, 2013

Enron_2This is the conclusion of the report on the Enron document review experiment that I began in my last blog. A Modest Contribution to the Science of Search: Report and Analysis of Inconsistent Classifications in Two Predictive Coding Reviews of 699,082 Enron Documents. The conclusion is an analysis of the relative effectiveness of the two reviews. Prepare for surprises. Artificial Intelligence has come a long way.

The Monomodal method, which I nicknamed Borg review for its machine dominance, did better than anticipated. Still, it came up short in the key component, as the graphic suggests, of finding Hot documents. Yes. There is still a place for keyword and other types of search. But it is growing smaller every year.

Description of the Two Types of Predictive Coding Review Methods Used

When evaluating the success of the Monomodal all predictive-coding-approach in the second review, please remember, that this is not pure Borg. I would not spend 52 hours of my life doing that kind of review. I doubt any SME or search expert would do so. Instead, I did my version of the Borg review, which is quite different from that endorsed by several vendors. I call my version the Enlightened Hybrid Borg Monomodal review. Losey, R., Three-Cylinder Multimodal Approach To Predictive Coding. I used all three-cylinders described in this article: one for random, a second for machine analysis, and a third cylinder powered by human input. The only difference from full Multimodal review is that the third engine of human input was limited to predictive coding based ranked searches.

This means that in the version of Monomodal review tested the random selection of documents played only a minor role in training (thus an Enlightened approach). It also means that the individual SME reviewer was allowed to supplement the machine selected documents with his own searches, which I did, so long as the searches were predictive coding based (thus the Hybrid approach, Man and Machine). For example, with the Hybrid approach to Monomodal the reviewer can select documents for review for possible training based on their ranked positions. The reviewer does not have to rely entirely on the computer algorithms to select all of the documents for review.

The primary difference between my two reviews was that the first Multimodal method used several search methods to find documents for machine training, including especially keyword and similarity searches, whereas the second did not. Only machine learning type searches were used in the Monomodal search. Otherwise I used essentially the same approach as I would in any litigation, and budgeted my time and expense to 52 hours for each project.

Both Reviews Were Bottom Line Driven

Both the Monomodal and Multimodal reviews were tempered by a Bottom Line Driven approach. This means the goal of the predictive coding culling reviews was a reasonable effort where an adequate number of relevant documents were found. It was not a unrealistic, over-expensive effort. It did not include a vain pursuit of more of the same type documents. These documents would never find their way into evidence anyway, and would never lead to new evidence. They would only make the recall statistics look good. The law does not require that. (Look out for vendors and experts who promote the vain approach of high recall just to line their own pockets.) The law requires reasonable efforts proportional to the value of the case and the value of the evidence. It does not require perfection. In most cases it is a waste of money to try.

Bottom_Line_Proportional

In both reviews I stopped the iterative machine training when few new documents were located in the last couple of rounds. I stopped when the documents predicted as relevant were primarily just more of the same or otherwise not important. It was somewhat fortuitous that this point was reached after about the same amount of effort, even though I had only gone through 5 rounds of training in Multimodal, as compared to 50 rounds in Monomodal. I was about at the same point of new-evidence-exhaustion in both reviews and these final stats reflect the close outcomes.

There is no question in my mind that more relevant documents could have been found in both reviews if I had done more rounds of training. But I doubt that new, unique types of relevant documents would have been uncovered, especially in the first Multimodal review. In fact, I tested this theory after the first Multimodal review was completed and did a sixth round of training not included in these metrics. I called it my post hoc analysis and it is described at pages 74-84 of the Predictive Coding Narrative: Searching for Relevance in the Ashes of EnronI found 32 technically relevant documents in the sixth round, as expected, and, again as expected, none were of any significance.

In both reviews the decision to stop was tested, and passed, based on my version of the elusion test of the null-set (all documents classified as irrelevant and thus not to be produced). My elusion test has a strict accept-on-zero-error policy for Hot documents. This test does not prove that all Hot documents have been found. It just creates a testing condition such that if any Hot documents are found in the sample, then the test failed and more training is required. In the random sample quality assurance tests for both reviews no Hot documents were found, and no new relevant documents of any significance were found, so the tests were passed. (Note that the test passed in the second Monomodal review, even though, as will be shown, the second review did not locate four unique Hot documents found in the first review.) In both elusion tests the false negatives found in the random sample were all just unimportant more of the same type documents that I did not care about anyway.

Neither of my Enron reviews were perfect, and the recall and F1 tests reflect that, but they were both certainly reasonable and should survive any legal challenge. If I had gone on with further rounds of training and review, the recall would have improved, but to little or no effect. The case itself would not have been advanced, which is the whole point of e-discovery, not the establishment of artificial metrics. With the basic rule of proportionality in mind the additional effort of more rounds of review would not have been worth it. Put another way, it would have been unreasonable to have insisted on greater recall or F1 scores in these projects.

It is never a good idea to have a preconceived notion of a minimum recall or F1 measure. It all depends on the case itself, and the documents. You may know about the case and scope of relevance (although frequently that matures as the project progresses), but you usually do not about the documents. That is the whole point of the review.

It is also important to recognize that both of these predictive coding reviews, Multi and Monomodal, did better than any manual review. Moreover, they were both far, far, less expensive than traditional reviews. These last considerations will be considered in an upcoming blog and will not be addressed here. Instead I will focus on objective measures of prevalence, recall, precision, and total document retrieval comparisons. Yes, that means more math, but not much.

Summary of Prevalence and Comparative Recall Calculations

A total of three simple random samples were taken of the entire 699,082 dataset as described with greater particularity in the search narratives. Predictive Coding Narrative (2012); Borg Challenge Report (2013). A random sample of 1,507 documents was made in the first review wherein 2 relevant documents were found. This showed a prevalence rate of 0.13%.  Two more random samples were taken in the second review of 1,183 documents in each sample. The total random sample in the second review was thus 2,366 documents with 5 relevant found. This showed a prevalence rate of 0.21%. Thus a total of 3,873 random sampled documents were reviewed and a total of 7 relevant documents found.

Since three different samples were taken some overlap in sampled documents was possible. Nevertheless, since these three samples were each made without replacement we can combine them for purposes of the simple binomial confidence intervals estimated here.

By combining all three samples with a total of 3,873 documents reviewed, and 7 relevant documents found, you have a prevalence of 0.18%. The spot projection of 0.18% over the entire 699,082 dataset is 1,264. Using a Binomial calculation to determine the confidence interval, and using a confidence level of 95%, the error rage is from 0.07% to 0.37%. This represents a range of from between 489 to 2,587 projected relevant documents in the entire dataset.

From the perspective of the reviewer the low projected range represents the best-case-scenario for calculating recall. Here we know the 489 relevant documents is not correct because both reviews found more relevant documents than that. The Multimodal found 661 and the Monomodal found 579. Taking a conservative view for recall calculation purposes, and assuming that the 63 documents considered relevant in one review, and not in another, were in fact all relevant for purposes, this means we have a minimum floor of 955 relevant document. Thus under the best-case-scenario, the 955 found represents all of the relevant documents in the corpus, not the 489 or 661 counts.

From the perspective of the reviewer the high projected range in the above binomial calculations – 2,587 – represents the worst-case-scenario for calculating recall. It has the same probability as being correct as the 489 low range projection had. It is a possibility, albeit slim, and certainly less likely than the 955 minimum floor we were able to set using the binomial calculation tempered by actual experience

Under the most-likely-scenario, the spot projections, there are 1,264 relevant documents. This is shown in the bell curve below. Note that since the random sample calculations are all based on a 95% probability level, there was a 2.5% chance that fewer than 489 or greater than 2,587 relevant documents would be found (the left and right edges of the curve). Also note that the spot projection of 1,264 has the highest probability (9.5%) of being the correct estimate. Moreover, the closer to 1,264 you come on the bell curve the higher the probability of likely accuracy. Therefore, it is more likely that there are 1,500 relevant documents than 1,700, and more likely that there are 1,100 documents than 1,000.

Prevalence_ENRON

The recall calculations under all three scenarios are as follows:

  • Under the most-likely-scenario using the spot projection of 1,264:
    • Monomodal (Borg) retrieval of 579 = 46% recall.
    • Multimodal retrieval of 661 = 52% recall (that’s 13% better than Monomodal (6/46)).
    • Projected relevant documents not found by best effort, Multimodal = 603.
Enron_Prevalence_Graph

Most-Likely-Scenario

  • Under the worst-case-scenario using the maximum count projection of 2,587:
    • Monomodal (Borg) retrieval of 579 = 22% recall.
    • Multimodal retrieval of 661 = 26% recall (that’s 18% better than Monomodal (4/22)).
    • Projected relevant documents not found by best effort, Multimodal = 1,926.
  • Best Case scenario = 955 relevant.
    • Monomodal (Borg) retrieval of 579 = 61% recall.
    • Multimodal retrieval of 661 = 69% recall (that’s 13% better than Monomodal (8/61)).
    • Projected relevant documents not found by best effort, Multimodal = 334.

In summary, the prevalence projections from the three random samples suggest that the Multimodal method recalled from between 26% to 69% of the total number of relevant documents, with the most likely result being 52% recall. The prevalence projections suggest that the Monomodal method recalled from between 22% to 61% of the total number of relevant documents, with the most likely result being a 46% recall. The metrics thus suggest that Multimodal attained a recall level from between 13% to 18% better than attained by the Monomodal method. 

Precision and F1 Comparisons 

The first Multimodal review classified 661 documents as relevant. The second review re-examined 403 of those 661 documents. The second review agreed with the relevant classification of 285 documents and disagreed with 118. Assuming that the second review was correct, and the first review incorrect, the precision rate was 71% (285/403).

When the content of these documents are examined, and the duplicate and near duplicate documents are removed from the analysis as previously explained, the Multimodal review classified 369 different unique documents as relevant. The second review re-examined 243 of those 369 documents. The second review agreed with the relevant classification of 211 documents and disagreed with 32. Assuming that the second review was correct, and the first review incorrect, the precision rate was 87% (211/243).

Conversely, if you assume the conflicting second review calls were incorrect, and the SME got it right on all of them the first time, the precision rate for the first review would be 100%. That is because all of the documents identified by the first review as relevant to the information request would in fact stand confirmed as relevant. As discussed previously, all of the disputed calls concerned ambiguous or borderline grey area documents. The classification of these documents is inherently arbitrary, to some extent, and they are easily subject to concept shift. The author takes no view as to the absolute correctness of the conflicting classifications.

The second Monomodal review classified 579 documents as relevant. The second review re-examined 323 of those 579 documents and agreed with the relevant classification of 285 documents and disagreed with 38. Assuming that the first review was correct, and the second review incorrect, the agreement rate on relevant classifications was 88% (285/323).

When the content of these documents are examined, and the duplicate and near duplicate documents are removed from the analysis as previously explained, the Monomodal review classified 427 different unique documents as relevant. The first review had examined 242 of those 427 documents. The first review agreed with the relevant classification of 211 documents and disagreed with 31. Assuming that the first review was correct, and the second review incorrect, the precision rate was again 87% (211/242).

Assuming the conflicting first review calls were incorrect, and the SME got it right on all of them the second time, then again the precision rate for the second review would be 100%. That is because all of the documents identified by the first review as relevant to the information request would in fact stand confirmed as relevant.

In view of the inherent ambiguity of all of the documents with conflicting coding the measurement of precision in these two projects is of questionable value. Nevertheless, assuming that the inconsistencies in coding were always correct, when you do not account for duplicate and near duplicate documents the second Monomodal review was 24% more consistent with the first Multimodal review. However when the duplicates and near duplicate documents are removed for a more accurate assessment, the precision rates of both reviews were almost identical at 87%.

The F1 measurement is the harmonic mean of the precision and recall rates.  The formula for calculating the harmonic mean is not too difficult: 2/(1/P + 1/R) where P is precision and R is recall. Thus using the more accurate 87% precision rate for both, the harmonic mean ranges for the projects are:

  • 40% to 77% for Multimodal
  • 35% to 71% for Monomodal

The F1 measures for most-likely-scenario spot projections for both are:

  • 65% for Multimodal
  • 61%  for Monomodal

In summary since the precision rates of the two methods were identical at a respectable 87%, the comparisons between the recall rates and F1 rates are nearly identical. The Multimodal F1 of 40% for the worst-case-scenario was 14% better than the Monomodal F1 of 35%. The Multimodal F1 of 65% for the best-case-scenario was 7% better than the Monomodal F1 of 61%. The most likely spot projection differential between 61% and 65% again shows Multimodal with a 7% improvement over Monomodal. 

Comparisons of Total Counts of Relevant Documents

The first review using the Multimodal method found 661 relevant documents. The second review using the Monomodal method found 579 relevant documents. This means that Multimodal found 82 more relevant documents than Monomodal. That is a 14% improvement. This is shown by the roughly proportional circles below.

Relevant_Circles_Compare

Analysis of the content of these relevant documents showed that:

  • The set of 661 relevant documents found by the first Multimodal review contained 292 duplicate or near duplicate documents, leaving only 369 different unique documents. There were 74 duplicates or near duplicates in the 285 documents coded relevant by both Multimodal and Monomodal, and 218 duplicates in the 376 documents that were only coded relevant in the Multimodal review. (As the most extreme example, the 376 documents contained one email with the subject line Enron Announces Plans to Merge with Dynegy dated November 9, 2001, that had 54 copies.)
  • The set of 579 relevant documents found by second Monomodal review contained 152 duplicate or near duplicate documents, leaving only 427 different unique documents. There were 74 duplicates or near duplicates in the 285 documents coded relevant by both Multimodal and Monomodal, and 78 duplicates in the 294 documents that were only coded relevant in the Monomodal review. (As the most extreme example, the 294 documents contained one email with the subject line NOTICE TO: All Current Enron Employees who Participate in the Enron Corp. Savings Plan dated January 3, 2002, that had 39 copies.)
  • Therefore when you exclude the duplicate or near duplicate documents the Monomodal method found 427 different documents and the Multimodal method found 369. This means the Monomodal method found 58 more unique relevant documents than Multimodal, an improvement of 16%. This is shown by the roughly proportional circles below.

two_circles_Unique_relevantOn the question of effectiveness of retrieval of relevant documents under the two methods it looks like a draw. The Multimodal method found 14% more relevant documents, and likely attained a recall level from between 13% to 18% better than attained by the Monomodal method. But after removal of duplicates and near duplicates, the Monomodal method found 16% more unique relevant documents.

This result is quite surprising to the author who had expected the Multimodal method to be far superior. The author suspects the unexpectedly good results in the second review over the first, at least from the perspective of unique relevant documents found, may derive, at least in part, from the SME’s much greater familiarity and expertise with predictive coding techniques and Inview software by the time of the second review. Also, as mentioned, some slight improvements were made to the Inview software itself just before the second review, although it was not a major upgrade. The possible recognition of some documents in the second review from the first could also have had some slight impact.

Hot Relevant Document Differential

The first review using the Multimodal method found 18 Hot documents. The second review using the Monomodal method included only 13 Hot documents. This means that Multimodal found 5 more relevant documents than Monomodal. That is a 38% improvement. This is shown by the roughly proportional circles below.

hot_circles

Analysis of the content of these Hot documents showed that:

  • The set of 18 Hot documents found by first Multimodal review contained 7 duplicate or near duplicate documents, leaving only 11 different unique documents.
  • The set of 13 Hot documents found by second Monomodal review contained 6 duplicate or near duplicate documents, leaving only 7 different unique documents. Also, as mentioned, all 13 of the Hot documents found by Monomodal were also found by Multimodal, whereas Multimodal found 5 Hot documents that Monomodal did not.
  • Therefore when you exclude the duplicate or near duplicate documents the Multimodal method found 11 different documents and the Monomodal method found 7. This means the Multimodal method found 4 more unique Hot documents than Monomodal, an improvement of 57%. This is shown by the roughly proportional circles below.

hot_Circles_unique

Conclusion

Enron_2On the question of effectiveness of retrieval of Hot documents the Multimodal method did 57% better than Monomodal. Thus, unlike the comparison of effectiveness of retrieval of relevant documents, which was a close draw, the Multimodal method was far more effective in this category. In the author’s view the ability to find Hot documents is much more important than the ability to find merely relevant document. That is because in litigation such Hot documents have far greater probative value as evidence than merely relevant documents. They can literally make or break a case.

In other writings the author has coined the phrase Relevant is Irrelevant to summarize the argument that Hot documents are far more significant in litigation than merely relevant documents. The author contends that the focus of legal search should always be on retrieval of Hot documents, not relevant documents. Losey, R. Secrets of Search – Part III (2011) (the 4th secret). This is based in part on the well-known rule of 7 +/- 2 that is often relied upon by trial lawyers and psychologists alike as a limit to memory and persuasion. Id. (the 5th and final secret of search).

To summarize this study suggests that the hybrid multimodal search method, one that uses a variety of search methods to train the predictive coding classifier, is significantly more effective (57%) at finding highly relevant documents than the hybrid monomodal method. When comparing the effectiveness of retrieval of merely relevant documents the two methods did, however, perform about the same. Still, the edge in performance must again go to Multimodal because of the 7% to 14% better projected F1 measures.


%d bloggers like this: