This is the restatement into one blog of a story I just published in nine parts. The story took on a life of its own and grew into a 20,000 word novella. This quasi-fiction is about a legal search project in the not too distant future. It is just another of my ongoing efforts to teach what I know about legal search in an interesting manner.
The first e-Discovery Team science fiction saga appeared on June 10, 2012. That blog was called A Day in the Life of a Discovery Lawyer in the Year 2062: a Science Fiction Tribute to Ray Bradbury. The story involved a legal search project in the year 2062. It was a first person narrative by the young attorney in charge. The future discovery lawyer described his admittedly very far out usage of the latest multi-sensory search software. It did not really teach or say that much about legal search today.
This second attempt at e-discovery science fiction, like the first, again combines law, information science, and artificial intelligence based search software. It goes into greater detail of what advanced legal search is like, and about the practice of law in the early 21st Century. Since the search project is set just a few years (months?) into the future, the descriptions are much more realistic than my first science fiction. In fact, some may argue that the software needed to do the project already exists.
Before I start the story I must apologize to all StarTrek fans. I am unworthy to write a bona fide Star Trek novel (all I can do are simple cartoons), so rest assured that this story will not really have anything to do with the beloved genre. Instead, only some general aspects of the Borg theme will be borrowed, including the so-called hive mind. Otherwise this not so subtle attempt to teach lawyers about current-day predictive coding will have nothing to do with Star Trek.
For those of you who do not know who the Borg are, much less their hive, that is not really necessary to enjoy the story, but it wouldn’t hurt either. You might want to watch a bunch of videos on who and what the Borg are, at Never Heard of Star Trek’s Borg?
Without further ado, here is the complete e-tech law instructional novel, Journey into the Borg Hive.
An e-Discovery Project in the Not-Too-Distant Future
My Saturday night was interrupted by a high priority message from our law firm president. We had won the Google RFP. We had been retained to handle the big China Space case. He mentioned in passing that all of our proposed terms had been accepted, except one. I later found out that Google did not counter on price, as we had feared. Instead, they asked for a revision to our discovery specifications. The partners in charge of the bid were not concerned. Why should they be? The variance just concerned a software and search protocol, and required use of a particular vendor. It only effected me, the discovery lawyer, not them. As they put it to me in the internal kick-off video conference on Monday, they knew I could handle it, especially since I was such an expert in those fields. I hate it when they do that.
Everyone in the firm was happy to get the case. I knew better than to rain on the parade during the video conference, so I smiled and said: Yeah sure, I’ll deal with it. Still, I wanted the whole team to understand how difficult and risky this new protocol could be. It could doom the whole case. I asked them to read a memo to file that I had already written about the possible impact of the change. I wanted to go on and explain why it was a problem, but knew that they were not that interested. After all, they hired me to do the e-discovery so that they would not have to deal with it. But, at least they said they would read the memorandum.
Google Picks a Borg Vendor
The change Google requested seemed minor to everyone but me. I understood right away that this was a dangerous Borg type protocol. It only used predictive coding, and in my opinion, minimized the skill and input of the lawyer reviewers.
Why didn’t Google tell us about this protocol demand in advance? Seemed like bait and switch to me and I was pissed. I later found out that a vendor had gotten to the general counsel. The CEO herself personally met with Google’s GC. She had talked him into the special protocol at the last minute by giving him some sort of special deal, or something.
Protests and CYA
I channeled my anger by writing a memorandum to file that summarized the risks involved with the Borg protocol. I also insisted that my concerns be shared with the client. Although the Google client lawyers all objected, senior management overruled. A watered down, very diplomatic version of my memo was sent to Google’s GC on Tuesday. The memo explained how the firm’s normal multimodal search protocol was commonly accepted in the industry and generally considered as far superior to the monomodal Borg approach specified in the new protocol. It also raised questions about the software of the new vendor Google retained. They were hired to do the non-legal e-discovery work in the China Space suit that Google’s own IT couldn’t handle.
The memo led to a short call on Wednesday with Google’s litigation counsel, Linda. As luck would have it, she was a fan of my blog. There were about a dozen other attorneys on the phone too, for reasons unknown, from both my firm and Google. We quickly learned that Linda was surprised by her GC’s decision to go with the vendor’s proposal. You could tell that, like me, she was a tad miffed about not being included in the decision.
I pointed out that the Borg protocol was risky and could lead to sanctions if it missed key documents, especially in this court. I said that relying on predictive coding alone was like jumping out of an airplane without an emergency parachute; that we needed the safety of the other search protocols. I also complained that it might take much longer than the multimodal approached outlined in our proposal. As far as I knew, no one had ever used a pure Borg approach in a major case. I knew I hadn’t. The approach seemed irrational. Why not use all of the search tools at your disposal? Why minimize the creative input of lawyers to find relevant documents, at least for the first seed set? It seemed crazy to me to justrely on random selection for the first training set, all on the pretext of eliminating human bias, myopia, inconsistencies, and mistakes. Those were all straw men, while the inefficiencies of not jump starting the machine training with known relevant documents was all too real.
I was very concerned about putting too much trust into the machine and machine learning. I was also suspicious of this vendor, and of any software that had only predictive coding features. Of course it was cheaper than a full service search engine, all it could do was machine learning. What a sales job they must have done on the GC. I kept that last comment to myself, but it was implied by my overall tone.
I was preaching to the choir with Linda. She agreed with everything, but said her hands were tied by marching orders from above. She asked me to do the best I could and try to work with this vendor. She promised towork with us on expenses, but pointed out, correctly, that we had accepted the revision without any request to change the discovery budget. It looks like I was struck with it, but at least I had a sympathetic ear with the client.
Still, I covered myself one more time with a follow-up memo to the case partners, copy to the head of litigation. I explained that the Borg search protocol could cost much more than the multimodal approach I had priced into the e-discovery budget. I was told not to worry, that the rest of the case budget was high enough to absorb some losses from e-discovery. That was comforting, but I did not want to see my department in red ink, while everyone else was in the black. I was used to e-Discovery carrying its own weight in the profit department. I hated to have that depend on whether a Borg approach with strange software would work or not.
China Space, Inc. v. Google, Inc.
This was a big case involving contract disputes between seven different departments of China Space, Inc., and Google, Inc. As usual, both sides also pled tort claims in the alternative. It had to do with some kind of technology they were both working on. Thank God neither side had any patent claims. I hate to deal with all of the special e-discovery rules those courts are always inventing. I still remember the first patent court rules that required you to use five keyword search terms. Seems really funny in retrospect.
One reason I still liked this case, even of it did have a Borg challenge, was our judge. We were in the hottest district in the country and all discovery issues in the case had already been assigned by the District Court Judge to one of the country’s top magistrate judges. The Magistrate had already noticed a 16(b) hearing devoted to e-discovery topics. I had only two weeks to prepare. Oh well, far better than the thirty minutes advance notice I sometimes get when on helicopter duty. I’m sure you know what that is.
We had known about the Magistrate Judge when we bid the case, but I did not think we’d be trying out some experimental protocol in his court. His expertise could, of course, cut both ways. We would have to see. So much depended on whether the Borg approach worked, or, more likely, just how bad it failed. At least if it all went bad, the firm’s reputation, and my own, would remain in tact. The judge would know full well that we were not using our regular vendor. He would know that the experiment was the client’s idea, or at least the client’s vendor’s idea. I would be sure that the judge understood the difference. In that way, if the worst did happen, we could try to deflect sanctions against our client (and us), and point the blame on the vendor’s software. I would make clear, if need be, that the only bad faith here was by the vendor’s salesmen.
Yes, even if their software warranty limits and disclaimers did hold up, the new vendor had a lot at stake here. They could lose big, or surprise everyone and win big. Maybe that’s why they lowballed their bid to get the contract. They wanted a chance to prove themselves, their method and software. They also knew I would have no choice but to go along and try my best to protect them in court. Like it or not, they were my experts now, and the Borg way is the search protocol that my client ordered me to follow. Resistance was futile, but still, I didn’t have to like it.
Google’s Bulk Self-Collection
The vendor was not involved with the collection. Google’s engineers had that well in hand. It was completed a week ago and had already been processed and loaded onto the vendor’s computers. Google IT did all of the collection, not the actual custodians. They copied everything that met the specified criteria, including time, file type and location.
Four hundred and eighty-five gigabytes (485 GB) of documents were collected by Google from a total of 45 custodians. These were the class-A custodians in the case, the ones we knew the opposing party and judge would want included in the first phase of discovery. All together, after deduplication, etc., there were over 4,000,000 documents from these custodians. There was an additional 1.5 Terabytes of ESI collected from the one hundred or so Class-B and C custodians, but their data was not loaded into the platform. We might look at their ESI in future discovery phases.
I would also usually load a few mock-docs onto the search and review platform at this point. By mock-docs I mean fictitious documents that might exist, or should exist, if either side’s story was absolutely correct. We would create these documents, carefully mark them so they were never produced, and then load them as machine training documents. But I was told that did not meet protocol here. Too bad. I had found that thesemoc-docs were a great way to jumpstart machine learning.
I had also found that their creation was an effective way to get the trial team to focus on what they considered the key facts in dispute to be. One of my favorite questions to ask a trial lawyer is, what would a smoking gun look like in this case? What would an email or other document look like that would ruin your case. I would also ask them to think positive, not easy for trial lawyers, and think what a silver bullet in this case would look like? What document would make your case for you? What words would it use? What concepts would it convey? Who would likely have said it? Anyone else? When?
I can quickly tell a really good lawyer by how quickly they catch on to themoc-doc game. The good ones know already how what they need to prove and disprove in the case. They already know what documents they would like to show to a jury to win the case. In some cases I had even agreed to allow the requesting party to submit moc-docs and use them to help guide our searches.
My First Meetings with the Borg Vendor
My first meeting with the Borg vendor started with the usual pleasantries. There were four or five of them on the phone, although after introductions only two of them ever said anything. I later learned they always worked in big groups like that. Part of the collective mentality I suppose. The CEO who had sold the project to the GC was not on the call. Naturally I hadGoogled all of them in advance of the meeting. They looked like they had not been outdoors in months; pale and pasty-faced, but they had good backgrounds in technology. Only one of them also had a law degree. The CEO, and so far she was a no show. Too bad, her Googles were very interesting.
The vendor knew where I stood on the Borg approach from my blog. The CEO was a regular reader and sometime commentator. After a few minutes of initial pleasantries, I talked about how I usually add moc-docsfor training at the beginning of a project. We talked about whether there was any place for that in their software. They talked it over, and after a few possibilities were discussed, they decided it could not be done. Oh well. At least I tried. I would have to figure out another way to get my trial lawyers to focus on relevance and closing arguments. I still had only a vague idea of relevance and I have been talking to them for hours.
The vendor team explained their approach to predictive coding, which they called fully automated. They did not seem amused when I called it the Borg approach. When I brought out the old jump without a backup parachute argument, they asked me if I drive my car with a bicycle in the trunk. I had to smile with that comeback. If you have a car they said, you don’t need a bike. Of course, I agreed with them to a point. The other forms of search – keyword, concept, similarity, etc. – were not nearly as effective as machine learning. Still, a skilled lawyer’s use of the other search methods, sometimes even including intuition and good luck, could help the machine learn by providing good examples of relevant documents to pattern and train on. It could vastly improve the initial seed set, if nothing else.
They said their CAR didn’t need a bicycle’s help. They kept telling me how much I was going to love their software, how easy the whole process was. They called it the lazy man’s approach to predictive coding. I remained unimpressed, but did my best to try to keep an open mind. Who knows? Maybe it would even work.
Borg Philosophy
We ended the first meeting by setting up a series of software training sessions on the actual data. Then I insisted on more meetings right after that with their top experts. They had one consultant I especially wanted to meet. She had a PhD in information science from the top school in legal search, so I wanted to hear her views. I did not expect to be able to talk her into multimodal. I knew that most information scientists did not appreciate lawyer search skills. They think that lawyers only want to shape the truth, and will use any trick to ignore or even bury facts not favorable to their client. They do not appreciate how seriously we take ethics and keeping our reputation in tact. They think we are only clever manipulators of facts, not bona fide discoverers of facts.
This anti-lawyer attitude, which at core is an anti-human Borg-like attitude, not just lawyers, is one reason some scientists and businessmen are anxious to find a fully automated process. They seem genuinely annoyed by the unavoidable fact that lawyers are needed as a final arbitrator of what is relevant or not.
They do not understand that the best discovery lawyers do not hide the facts, they explain them. Their clever intents are not on the facts. They are on the law. Lawyers are sworn to get the truth, the whole truth, and nothing but. And then to go on and win the case anyway. In theory the best scientists are the same way, only they are usually not as concerned with the expense and burden of search as the lawyers.
Random Sampling Setup For SME
In my later meetings with their experts I learned more about the search methods built into the protocol. As to probability statistics and random sampling, they were using a 2% confidence interval, and, of course, a 95% confidence level. That meant an initial sample size of 2,401 documents out of the 4,000,000. As the SME for the case I was going to have to singlehandedly review all of these myself, both in the initial sample and then again another 2,401 in the final quality assurance test sample.
I had no problem with that seemingly big burden. Although many contract review companies will tell you to assume that only 50 files per hour can be reviewed safely, I knew better. They said it might take me 48 hours to do that random review by myself (2401/50), and suggested I use two of their best reviewers instead. I did not go for that. I knew from experience that their time estimate was way off. Plus I knew that with multiple reviewers inconsistencies have to crop up. Also, I did not think their reviewers would have even close to my expertise and experience on relevancy. After so many years as a litigator I could spot the unexpected and see many ways that a document could be relevant or irrelevant. I wanted to be sure that the initial baseline random sample was done correctly.
Besides, random samples are quick and easy to review. That’s because 98%, 99%, and sometimes even more than 99% of the documents sampled are usually irrelevant in a case like this. After a while it becomes very easy to quickly recognize and label irrelevant documents. (The hard part is in spotting the rare relevant. That is where the legal skills and experience come in.) I could usually run through 200 documents in a random sample in a half hour. That is how I worked. In half hour bursts. I found that was the best way to maintain maximum concentration and efficiency. Since my hourly rate was over ten times that of an average contract lawyer, I wanted to be sure the client got its money’s worth. I thought I could probably do the review part at a speed of 400 files per hour, and thus review all 2,401 samples in six hours.
QC of the Borg Cube
I also wanted to be sure this new vendor knew how to properly set up and sample the database. I verified that they were going to exclude all documents that lacked text, or lacked enough text for predictive coding to work. The sampling works best when there are strict limits placed on the sample pool. The sample is only of meaningful documents for machine learning. I had a side team of trusted reviewers to search and code the non-text types of ESI. There were not really that many, besides, if an image or some other non-searchable computer file was part of an email, or other document found to be relevant, then under our standard protocols it would automatically be dragged back into the final production.
After the initial random sample, portions of which would also be used by the software for testing and training, the computer code would kick in. It would start to feed my review team batches of 400 documents at a time. A select percent, which I understood varied but was always less than 20%, would be selected randomly. The rest would be selected by the computer. Like most other predictive coding software I had used, it was designed to select documents that its analysis showed would most benefit from human classification. That usually meant documents in the 25% to 60% probable relevant range, but not always.
They had a review team of eight lawyer review specialists set up and ready to begin study of the briefs and relevancy notebook that my attorneys were working on. I reminded the vendor that I was required both by contract with the client, and by legal ethics, to personally supervise the work of their contract lawyers. I insisted on daily video conferences with the reviewers. That was not too difficult in this case since all of them were together in the same room at the same time. Their first assignment would be to review the same 2,401 random sample documents that I was for relevancy, but to look for confidentiality and privilege concerns, not relevancy.
There would be no need for this vendor work at all but for the fact that the Magistrate Judge was known to require disclosure of random tests and seed sets. His local rules also gave us the right to withhold and log any irrelevant and confidential or privileged files. We planned to do that very carefully, which we why we devoted the vendor’s entire review team to it. We knew we had an ethical duty to protect our client’s confidential data. This was one reason we never volunteer for such full disclosure. But with this case, in this court and this judge, that was how it was going to be. So we would at least make sure that all privileged and confidential documents were spotted, stamped, redacted or withheld and logged. We were going to protect the confidentiality rights of our clients, their employees and customers. The last thing we wanted to do was have to rely on our clawback agreement and order.
The vendor agreed with phony vigor to all of my supervision demands of their attorneys. They had no choice. It is illegal for them to practice law. They are a commercial corporation, not a law firm. Plus, they figured I would probably be like most of the lawyers they worked with. I would start off talking a good game, and then slack off on supervision as the project commenced and deadlines loomed. They were in for a surprise. The last thing I would do on this project is take it easy. I did not trust the Borg.
We spent the rest of the meeting assigning agenda items to begin preparation for the first 16(b) hearing on e-discovery the following week. I delegated everything to other associates and partners on my e-discovery team except for the initial review of the sample documents. I hoped I was right about the low yield assumptions in the 2,401 random documents. Otherwise my six hours of review could easily turn into sixty.
Review of the 2,401 Sample
I started the project by review of all 2,401 samples at once, rather than taking them in sets of 200 as the vendor recommended. That was for amateurs. I knew I could take advantage of display sort alternatives by doing the entire group at once. I would use those methods to quickly knock off certain files with bulk-coding. I called that my extended spam-type culling. (Regular spam had all supposedly been caught by the client’s spam filters, but many always get through, even at Google.) Spam for me meant any type of file that could not be relevant. It was like a custom de-Nisting. I also knocked off most of the newsletters that way. Yup, even Google employees still subscribed to a few, not to mention Amazon orders.
Before I began the bulk coding I looked over the whole set using various visual displays. I looked at the most popular file types, email subject headers, and the like, including the obligatory user display with the connecting lines. I also glanced at the time line display. Then I sorted by file size to quickly review and knock off the too-small for meaningful content files. One actually had content (I still quickly glanced at each to be sure), but it was irrelevant minimalist content none the same.
Only then did I do the extended spam, custom de-Nisting type bulk coding. After that I settled in for an alphabetical listing where I could easily see all of the similar emails and code them. I found one large collection that was some kind of group file having to do with internal IT announcements. Like most companies the IT department cluttered everyone’s email with tons of important announcements, including maintenance, upgrades and virus alerts. Second was stupid HR announcements and then the obligatory celebration emails.
After the first hour of fun work like that, where I knocked off the easy stuff and got a feel for the database, I settled in for the hard grind, the remaining approximate 2,000 emails that required individual scanning to determine relevance. I put the software into power-review display mode, the one that maximizes the display size of the document under review. Then I began to use the hot key combinations for irrelevant and train coding. Hot keys were much faster than using a mouse to check boxes and click. Mouse work involved two steps and sometime it could take a millisecond to get the cursor in the right position. Hot keys were always much faster, unless you were bulk coding. Still, occasionally I had to use my mouse to move around and see something outside of the default view.
I could do this review one-handed if I wanted, and would sometimes do so for a change, or to drink coffee or something. But I found it was faster to use both hands and keep them both hovering above the keys to tap and tag a document as irrelevant and move on. So that was my standard technique. I could code some documents that way in a second. Of course, this required very high-speed Internet. This was a necessity for me, not a luxury. So too were the latest computers I used to do the review. Sometimes I would use multiple monitors, but often I would just use my favorite laptop, the new MacBook Pro, and move around for variety; all wireless of course.
Attaining Ideal Work Flow
My favorite spot to code in the Winter was next to the pool. With global warming Florida was an especially good place to code outside and catch some sun at the same time. I’m a big believer in multitasking, but when I review, my focus in 100% on the documents and keeping my mind open to evaluate any possible relevance. The millisecond the next document enters my consciousness the mental calculus begins as to whether there is anyway it could be relevant. Sometimes I use music to help keep the concentration levels high, sometimes not.
Every now and then I would encounter a document that took a while to recognize and identify as irrelevant or not. Sometimes that would require paging down to the bottom of a document not in view, or changing the page orientation ninety degrees (although I usually just twisted my head sideways as that was slightly faster). Sometimes it would even require moving on to the next pages, to see earlier strings in an email chain, or later portions of a word doc or a Powerpoint. But that was rare. The screen was large enough to see most of the document in one glance.
I had trained myself over decades of computer use, including thousands of hours of gaming, to take in a screen all at once, and speed-read words where necessary. I am a big reader anyway, so this was all second nature to me. I also had a few highlights turned on for certain key words, and every now and then this also helped for quick recognition. Usually, however, a detailed read was not necessary. A quick glance told all.
I had read so many emails of so many different people of the years that I knew the patterns. I knew the kinds of things people would say and not say in emails, which still constitute the bulk of all my review work. I had seen it all before.
I was moving fast but I was not in a hurry. That is always a mistake. It is better to be in a very relaxed timeless state, what Csikszentmihalyi calls Flow. Sometimes the flow was strong and nothing could stop it. Other times it was weak and easy to disrupt. Sometimes even a slight hangup in page reloading would disrupt the flow.
These hiccups were a common occurrence. I was, after all, working remotely on a computer that was located over 2,000 miles away. Even though I had the best Internet connections, there would sometimes be slight delays, usually just a second on two, but sometimes longer. I trained myself to take mental mini-vacations when that happened, rather than get annoyed. That helped keep up the timeless, concentrated flow. I’d just rest with a quick mind-blank. Maybe I’d look around or rub my eyes. It was critical that you kept your mind alert and open and also important that you not over-strain your eyes.
I have trained myself over the years to take as long as needed to ID a doc, but no longer. Sometimes I would change the work flow and put electronic sticky notes on a document that I might want to refer to in the future for some reason, such as to explain my reasoning on a close question, but that was rare. I did not rush and I was always careful to be careful. I did not want to make a mistake.
This was not a race, it was a search for evidence in an important legal proceeding. If it took five minutes or longer to ponder a document, then so be it. If I had to go back and double check a document I had already marked, then so be it. Don’t go so fast that you lose your ability to stop suddenly where necessary to be sure. Some documents were tricky and hard to identify. Sometimes a document would look irrelevant at first glance, but upon further reading you would see a relevant statement. You needed to be sure of your coding before you clicked to the next document. This took a high skill level to do right. Every experienced reviewer knows that.
When I did finally run into a relevant document I would read it very carefully and think about its significance, noting its time, the persons involved, and language used. I would also evaluate its weight. If it was very important, I would mark if as Highly Relevant, but this is rare. I did not find any hot documents in this random sample review.
I also trained myself over the years not to be distracted by interesting irrelevant documents, typically emails with jokes or sexual references. Sometimes you would run into glimpses of human drama, emergencies with kids, nasty comments. It was tempting to read on, but better not to. Every now and then you also would run into a custodian with too much personality. You know the type I’m sure. They often use politically incorrect speech and are outspoken. I try just to make a mental note of the loose-lipped custodian and move on. Sometimes I break the iron discipline and spend a minute to amuse myself by reading an irrelevant email. It was rare, but happened. I knew from experience that could be a real time killer, so it usually took quite a bit to get me to linger. It is better not to get distracted in any way or start fantasizing about the content, or the custodians. That slows you way down.
I was concerned about efficiency, but the main thing in review is always accuracy, not speed, especially in the first random set review. It is important to set up the initial benchmark correctly. An inaccurate prevalence calculation could throw all of the quality assurance tests off, not to mention slow down on the machine learning. I took my time and cruised through at an average of 200 files per half hour.
I would take a break every thirty minutes or so, sometimes longer. I did not allow phone interruptions, or people interruptions, unless I was on a break. I preferred to work alone. I often wore headphones. Sometimes a break would last an entire day, either for personal reasons, like a family event on a weekend, or sometimes the break was work induced. Another case would interrupt and take priority. That is just the way the practice of law goes. I had no trouble picking back up the one time that happened here. A break for several days can throw you off, but just one day is not a big deal for zoning back into the relevancy gestalt of this case.
First Review Task Completed
I completed the review in three days, including the lost day. It took me seven hours of review time, just a little more than expected, but not much. I averaged 343 files per hour, not bad, but hardly my best time. Note I do not include preparation of my obligatory memo to file as part of my review time, nor the periodic quality control work I built into all of my reviews. That is part of analysis and is separately described in final billings. Even though this was a flat fee, we still kept track of time and task descriptions.
My concern that this collection might be unusual and have a high prevalence rate was misplaced. This was a typical low yield collection and so the review went about as fast as expected. I knew low prevalence was the norm in legal search, and it would only get worse in the future as the data explosion continued. The more documents, the larger the haystacks, the lower the percentage of relevant needles. Although the amount of relevant information was also expanding, the needles, so to speak, it was not keeping pace with the overall explosion of data. The information noise is always growing faster than the music. Plus, human limits on persuasion remained, the seven plus or minus two rule of trial practice would never change as long as the jury trial remained a fundamental right: 7 +/-2. The number of electronic communications was ever-increasing, but not discussions of topics relevant to most litigation. The issues raised in the dispute between Google and China Space were many, but few communications in any way concerned them.
Analysis of the Sample
My next step was to analyze the metrics of my sample review. My review of the 2,401 sample found ten hits, ten relevant documents. None highly so. A little high in my experience, but not unexpected because this was a big, complicated case and the trial boys still hadn’t narrowed any issues. They were waiting for the first mediation and court rulings.
The calculation required for the first baseline metric of prevalence is simple, just division: 10 ÷ 2,401 = .00416493127863. I rounded off and called it a prevalence rate of 0.42%. Since we were dealing with a data set (corpus) of 4,079,293 computer files, that meant the spot projection of the number of relevant documents that were likely contained in the corpus was 17,133. Our model of perfection was to retrieve that many relevant documents from the corpus. But unlike the Borg in StarTrek, I did not have to attain perfection. Reasonable, proportionate efforts are all that are required by human law.
To get a more exact probability confidence interval range in a low prevalence dataset like this we needed to make binomial calculations. We find that the 95% probability interval was from between 0.20% and 0.76%. Remember the center point is 0.42%. Anything more or less than .20% and .76% would have less than a 2.5% probability.
This means a document spread around our peak probability number of 17,133 relevant documents would be from between 8,159 to 31,003 documents. There was a small chance that there were only 8,159 relevant documents in the dataset. There was an equally unlikely possibility that there were as many as 31,003 relevant documents. The least likely results, which were shown on the far ends of the bell curve, were the extremes of 31,003 documents and 8,159 documents. More or less documents than these extremes would fall outside of the 95% confidence interval. They are contained in the extreme tails at either end of the curve and are shown in blue in this graph. These tails theoretically stretch out to infinity.
The closer to 17,133 the higher the probability of likely accuracy, with 17,133 having the highest probability (9.5%) of being the correct estimate.
Meeting with the Borg Queen
I finally met Siri, the notorious CEO of the vendor. I was glad it was a video conference, not audio, as she was easy on the eyes. She looked much younger than I had expected. Siri wore one of those new computer head-gear add-ons for the iPhone. It could not only project a screen interface, but it had biometric devices andneurophysiological manipulation capacity. She did seem very relaxed. She also had on some kind of matching glove. It probably had some kind of special purpose, but I didn’t know what it was.
I went over the results of the first sample search with them. They seemed uneasy about how quickly I had completed the project. The truth is, most were still stunned by my decision a few days ago to do the review myself. They had expected their contract lawyers to do both the relevance review and the confidentially protection. Most figured I would change my mind. They thought that is what the meeting would be about, not to announce that I had already finished it. But what really surprised them was the prevalence rate I found. Siri did not say a word, but seemed troubled when she heard that.
The chief scientist blurted out that he had never worked with a yield rate that low. He said they usually have prevalence rates of from between ten to thirty percent. They once had a project as low as 3%. He had never seen a prevalence rate this low, 0.42%. He thought I must have made some kind of mistake.
That led to a long discussion where I learned that they never do collections, and do not really pay much attention to that. Another vendor would always do that, or, as in this case, the client did it themselves. I asked them to find out if they get raw data collections of a custodians entire collection, or search filtered collections, such as all emails and attachments that contain certain terms or metadata. They said they would check on that and get back to me right away. We then moved on to the 16(b) hearing prep issues. That took a couple of hours.
By the time we were done with that work, a young man came on camera and said he had completed the collection research for their last three projects. They were efficient, I have to say. He found that in addition to the usual deduplication, deNisting and date range restrictions, all had some other search filter in place. Two were simple keyword search filters and another was a more complex multimodal that also included a simplistic type of machine coding. Bottom line: I was right. They were used to working with pre-culled data sets. But not only pre-culled, over-culled. The keyword searches, even though very broad, and imprecise, were still likely to omit over half of the relevant documents. Their clients had made a fundamental mistake of not feeding the predictive coding search engine all of the data. That explained why they usually had ten to thirty percent prevalence rates.
They got it right away. I could almost see them processing the implications of this on their software’s fully automated approach. I then began a series of questions, automatically switching into expert deposition mode, and quasi-hostile witness cross-exam at that. No, they had not tested their software on anything less than one percent. This was going to be a first. No, they were not sure how the algorithms would react. Then the sales guys at the meeting went into customer soothing mode and began assuring me that everything would work fine, it might just take a little longer. The scientist and the techs then joined in and started explaining very rapidly why it would not make any difference. I nodded but was not at all convinced. Don’t they think I’ve never seen a snow job before, just because I’m from Florida?
When I left that meeting my concerns regarding the reliability of the software were stronger than ever. This was likely to be a disaster and there was nothing I could do but watch. The GC would never change his mind. Linda had made that clear to me.
The vendor’s fully automated software would not work with a 0.42% prevalence dataset. It would take forever to get going by random samples, and once it did reach a relevancy vector, the scope of its recall would be arbitrarily limited. I was concerned that the random approach would not catch unusual types of relevant documents that were not included in the random selections. It could even miss highly relevant ones. That could be a recipe for sanction soup, a very untasty broth if ever there was one. I would rather drink Socrates’ wine than have that happen to my client in my case. I had to figure a way out.
A Plan is Hatched
I considered the possibility of a personal confrontation with Google’s GC to talk some sense into him, explain what a big mistake he had made picking this vendor. Then I remembered what little hard proof I had of that, yet. I also thought about how some of my partners would react to that approach. No. I had to come up with something else. I should talk to Linda again, explain what happened with the vendor. Maybe she could help. After all, we both wanted to protect the company and steer it clear of sanctions. And, neither of us trusted this vendor.
I called Linda the next morning, early, and was surprised to get through to her right away. This was the first one-on-one call I had had with her. She was much nicer to deal with. I told her the story of low yield. I was now convinced the software would miss obvious documents that a hybrid multimodal would surely find. Linda said that reminded her of something she had read in the vendor’s contract with Google, something about forfeiture of fees if the software was defective. I asked for a copy of the agreement. After that we both lamented that neither of us could think of anything to do but further paper our files with CYAs, hardly a satisfactory solution. Then I got a call about a new case and helicoptered in to help a stressed out team in California.
When I got the vendor agreement the next day I read the sales-manlike terms and provisions, most very cleverly favoring the vendor in non-obvious ways. Then I found the defect warranty provision Linda was thinking about. It specifically defined a defect by comparison with another software program running the same search under the exact same conditions. That made it virtually impossible to ever prove a defect and get a refund. But it gave me an idea. It would be a big gamble, but my friends and I would be taking the risk, not the client. It might just work to save the day.
I did not have time to think it through. The 16(b) hearing was this afternoon and I had to complete preparations for my part of the hearing.
Delusions of Duty
I had already endured many meetings with opposing counsel and I was looking forward to a meeting with a judge in the room. Their e-discovery expert was ok, but her hands were tied, much like mine, on the relevancy scope issues. The trial jockeys had final say on that. Both sides had reached a point of barely civil intransigence. The trial lawyers, most of whom looked like their counter-parts from the Eighties, were going to argue several relevancy issues at the hearing.
I knew that the behavior of the attorneys would magically change at the 16(b) hearing when they stood for the judge to enter. It was like entering a Sunday School pageant and pretending like you were an angel. I was looking forward to the new improved versions of their personalities. So too was my discovery counterpart on the other side. We did not want to go forward with our respective search projects until the relevancy issues were resolved. We hated re-dos. We were hoping for a clear focus of the issues and had about twenty-five exhibits lined up for projection on the court’s HD equipment. If our trial lawyers were really good, we had a chance of getting the relevancy rulings we needed from the Bench.
Unlike the trial boys, we discovery lawyers did not really care where the initial relevancy lines were drawn, we just needed the lines to get going. After all, the odds of any document actually making it to the final trial exhibit list were over 100,000 to 1 against. The fighting over finer points of relevance was a necessary exercise, we knew, but we also knew it would all shake out in the end. If we did our search job correctly the few hot documents would inevitably percolate to the top. Nothing else really mattered. The fine-tuning of initial relevancy vectors were not that important, except to avoid re-dos.
This is something that many trial lawyers did not understand. A few of the old and powerful discovery lawyers around don’t get it either. They were of the old-school where you Tiffed and reviewed everything and fought against all disclosure.
Many e-discovery vendors helped perpetuate these delusions. They fed fuel to the fire. They promoted over-discovery and fight-everything, not because they believed in the tactics, they knew better from experience, but because their yearly bonuses depended on it. They justified a little exploitation of their customers as part of their duty to the corporation’s shareholders to maximize profits. These vendors benefitted mightily from over-discovery. They loved a review re-do and forensic fights. They did not have a long-term view of corporate profitability.
These vendors had much in common with the old school lawyers they exploited. The lawyers also justified their over-contentiousness and over-review under the cloak of duty. The so-called duty of vigorous representation of their clients. They would say that all litigation, including discovery, is war! They were not about to compromise their duty to their clients by cooperation and proportionality. This also led to more and more fees. Everyone benefitted from this ill-informed mutual back-scratching, except, of course, for the clients stuck with the bills, and the judges who occasionally had to listen to their bickering. I understand that this has been going on since the nineties.
I was still not quite sure where Siri, the Borg Queen, fit into this, but she did not seem to be one of the old-school vendor types. She seemed bent on constrained review, and cooperation. Siri’s problem was she put too much trust in technology. But, who knows, maybe she was right? I blanked out for a few seconds, or was it minutes? I had spentway too much video-time with the Borg CEO yesterday. I needed to stop daydreaming about Siri and focus on the judge and hearing this afternoon.
16(b) Hearing of the Future
The 16(b) hearing started when a buzzer went off and we all stood. The magistrate judge entered the room in his black robe, sat down at the bench about three feet above us, and started looking at his computer screen while the case was called. Everyone sat down. We then took turns making our appearances of record. One of the mikes had the volume set too low and that took a second for the clerk to adjust. Every word we said was being recorded. The judge occasionally glanced down and smiled as the attorneys introduced themselves and the client representatives. The 16(b) hearing notice required that they attend, along with each side’s vendor expert. This was one of two 16(b) hearings for the case, the one devoted to discovery issues. After the initial formalities the judge went right to it.
The judge noted that our vendor was new to him, but he had looked them up, and expressed an interest in their approach. Siri had come to the hearing and he greeted her as if he had met her before. Although some judges had commented on the Borg approach, all negatively, he had not. He did not address the merits of their approach, but instead moved on to China Space’s vendor. He had met their lead expert before in other cases and generally knew how their software worked. It was a popular type of multimodal predictive coding software. The judge chatted with him for a minute about an upcoming event. This judge did a fair amount of speaking at e-discovery invites, and unlike some, he got his invitations on merit. That meant he knew e-discovery better than the trial lawyers, and also knew me and my e-discovery lawyer counterpart for China Space. The judge was pleasant to us as usual and threw in a few comic remarks at our expense. But he also made some flattering statements about us for the benefit of our clients.
He noted that all technical issues had been agreed to in the stipulation we filed last week. We had agreed to what was beginning to be called themutual QT approach. QT stood for quasi-transparency. We agreed to limit our search disclosure of irrelevant documents to the random sample sets taken before and after the review as part of the quality assurance tests, including disclosure of the null-set of excluded documents. We did not have to disclose quality control data, nor the initial seed set, nor the subsequent training sets. Only the metrics of those searches had to be disclosed, along with all relevant documents at periodic production points designed to better control work product protection.
That meant the disclosure of irrelevant documents by both sides was very limited. Both sides also had to make only limited work-product type disclosures of their search methods. I made sure of that for my own reasons. Too much mandatory disclosure would destroy my plan to save the client from the Borg approach. Predictive coding on the QT suited me just fine in this case.
Both sides also reserved rights to object to the reasonability of the other sides search efforts. This was standard and anticipated. The judge just noted it and said he hoped our approaches worked and he did not have to have later hearings on that. He looked directly at Siri when he said that. I took this as a signal he had doubts about her system, much like I did. But I was there with her, so he moved on to other issues quickly. Still, I knew if the review screwed up, as I was afraid, he would not hesitate to impose sanctions. This judge was well-known for that. They would no be dispositive, but would likely involve monetary sanctions. At the very least we might have to do a redo, which can in itself be very expensive.
The judge went out of his way to thank us for reaching agreements. He said just yesterday he spent an hour hearing quality assurance test arguments. The core of the argument in the other case was the confusion matrix. Four of us in the room knew what he was talking about. The discovery partners and associates in my firm would too. I was always telling them that you need to understand the confusion matrix to avoid getting lost in it.
To our surprise the Judge turned on his projector and a standard GC Version of the Matrix came up. He explained he had it from the hearing yesterday. He then used the diagram to help ask us a few questions regarding the mutual quality assurance tests we had agreed to. The Judge knew this was critical to avoiding reasonable search arguments later on. He said he wanted to be sure we had covered all of the bases.
CONFUSION MATRIX
Truly Non-Relevant | Truly Relevant | |
Coded Non-Relevant | True Negatives (“TN”) | False Negatives (“FN”) |
Coded Relevant | False Positives (“FP”) | True Positives (“TP”) |
Accuracy = 100% – Error = (TP + TN) / (TP + TN + FP + FN)
Error = 100% – Accuracy = (FP + FN) / (TP + TN + FP + FN)
Elusion = 100% – Negative Predictive Value = FN / (FN + TN)
Fallout = False Positive Rate = 100% – True Negative Rate = FP / (FP + TN)
Negative Predictive Value = 100% – Elusion = TN / (TN + FN)
Precision = Positive Predictive Value = TP / (TP + FP)
Prevalence = Yield = Richness = (TP + FN) / (TP + TN + FP + FN)
Recall = True Positive Rate = 100% – False Negative Rate = TP / (TP + FN)
Our stipulations on the Matrix tests were somewhat general and the Judge wanted to know exactly how we had resolved the details that so bedeviled his hearing yesterday. I don’t think he had ruled yet in the other case.
Making Up the Law of the Case
We were very careful in our answers, well aware that our explanations would become binding agreements enforceable by court order, including contempt. That’s why the microphones were on and everything was recorded. For all I know the video cameras were also on. I had forgotten to check. Still, since I talked about this damned Matrix all of the time, I enjoyed a chance to talk shop with a judge who really understood. That was still a rarity, although there were many more knowledgeable judges around now than in the old days.
A couple of times I joked with the e-discovery lawyer and her vendor on the other side when the judge asked us a few questions that we had not even discussed before. We were quick enough on our feet and able to agree and clarify a few points with the Judge’s help. After a while there is no pressure and you get used to making law, especially here, where it was just for this one case.
How To Pass An Elusion Exam
The Judge seemed primarily interested in the details of the Elusion Test we had agreed to. Both sides were, of course, going to use the same tests to determine when their search would be concluded, at least for the all important first round of discovery. Very few of my cases actually engaged in the second, much less third round provided for in this Judge’s standard order. The quality assurance tests were not dispositive, and both sides always still reserved rights to object, even if the search passed all tests. They just could not object right away, but would have to wait until they had the full production.
Basically the Judge wanted to know what results would be considered a passing grade in the Elusion Test. We said although there was no specific maximum number, any higher than which would be considered a test failure and automatically trigger another round, we both expected the percentage of False Negatives (FN) to be very low. Our pass-fail test metrics had more to do with the types of FN found. We had agreed to a qualitative version of Accept on Zero Error. We agreed that the Machine Learning would be considered a success, and further rounds of training would be unnecessary, if two conditions were met: (1) none of the FN were highly relevant documents; and, (2) any documents demonstrating a new type of relevance, a new type of relevant document not seen before, would have to be ranked as only of marginal relevance.
We did not care about any missed relevant documents that were justmore of the same. So much of e-discovery was like that. The same or similar documents and communications that were not technically duplicates would keep showing up time and again. The redundancy in today’s collections of documents was ridiculous. I wished that the vendors would finally agree amongst themselves about a new ISO standard for extended deduplication. It was as if we were all drowning in a mountain of unoriginal books, most of which just plagiarized off each other.
The judge understood the relevancy ranking distinctions we were making. That our Elusion Test pass-fail criteria primarily had to do with the importance, if any, of the particular FN documents to the merits of the case. The only kind of document assigned to Accept on Zero Error were highly relevant documents. I joked that I hoped we would not have to return to argue about whether an FN document was highly relevant or not, but noted that this kind of dispute was not uncommon.
What Is Relevant?
The judge then looked at the trial lawyers, who he knew would argue the relevancy issues, and told them with a mock-stern look that he did not want to see them again. They all laughed. Then the judge said he was satisfied with the electronic portions of the discovery stipulation and moved on to some deposition and interrogatory issues. After that he asked if counsel was prepared to make arguments on the one stipulated issue remaining for resolution – the relevancy issues triggered by the first production of documents, the random sample of 2,401 documents that each side had already produced to the other.
Of course they were ready, and the judge told the plaintiff’s lawyer to go first. He made an argument as to the relevancy of three documents that our side considered to be irrelevant to any of the genuine issues of material fact in the case. He also made a counter-argument to our side’s position that two of the documents he considered irrelevant were actually relevant. He forcefully argued that both of these documents had no probative value to any possible issue in the case. This was a typical relevancy argument. It focused on the five documents in the initial random sample that the parties could not agree upon. Per local rule we exchanged our sample sets, with privilege only withheld and logged. A dispute on only five out of four thousand, eight hundred, and two documents was a pretty good achievement. Both sides had worked hard to get this far.
While the trial boys talked I took control of the court’s media display panels for attorneys and displayed the five documents at the appropriate times, which got pretty challenging when the judge started asking questions. The displays included the highlighting that counsel had agreed to. By the way, that damned agreement took over four hours to reach, since both sides’ trial counsel wanted to sign off on those important details. They all knew that highlighting could have persuasion consequences. They wanted to make a good first impression.
Next my attorney spoke and argued our positions on relevancy. We mirrored the other side and explained why three of our documents were not really relevant, that they were True Negatives, not False Negatives. Thankfully he knew enough of the vocabulary to use the terms correctly.
The Judge then asked a few questions, mainly probing why one side considered documents to be relevant that the other side had coded as irrelevant. After arguments he took a few minutes to write something on his computer while we all waited in strict silence. I was holding my breath, hoping that he would rule right away so I could start my search. I hated it when some judges made you wait a few months on relevancy issues.
He did not disappoint The judge ruled against the pro-relevancy arguments of both sides. He agreed with the decisions made by both sides to identify the documents as irrelevant. I was relieved to hear him take a narrow view of relevancy. Then he briefly explained his rationale on the record as to each of the five documents. The judge said he would write a full written order later and looked over to his law clerk and winked. He asked if we had any questions on his bench ruling, for otherwise he would expect us to begin compliance with the ruling immediately. He pointed out that under local rule we now had sixty days to complete the initial cross-production of ESI.
I saw Siri on her cell phone on the way out talking to her document manager, telling him that they could begin the review project. The initial training had already been done, so they could proceed in earnest to beginthe full automated review. I told them they could not begin until after we were certain of the initial seed set. We were now certain. The seed set we selected had been fully approved by the court, all three objections had been overruled. Siri knew she could begin and did not even ask me first for confirmation.
From Thought to Action
Elusion test or not, I was still concerned about Siri’s fully automated approach. All of the machine training documents would be randomly selected, even the initial seed set. I had been thinking about a plan to save my client from the Borg approach. I had only a general outline of what to do. I knew all of the details would fall into place soon enough. It was time to move from thought to action, and risky creative action at that. I was reminded of Ray Bradbury’s famous quote about thinking:
Don’t think. Thinking is the enemy of creativity. It’s self-conscious, and anything self-conscious is lousy. You can’t try to do things. You simply must do things.
I knew Bradbury meant Dont think as a poetic counter-balance and a warning. He was a man who thought a lot and was very creative. If the saying were literally true, that you should never think, then all of us who think for a living would be lost. We would be paralyzed by analysis. Truth be told, there is considerable danger in too much thinking, which is why Bradbury wrote what he did. Fortunately today there are several apps for that. As Yoda would say: Do or do not, there is no try.
As an old computer guru (that used to be a common phrase), I found inspiration in what is known as the hackers credo: code wins arguments. That meant to try it out, to take action. This was an application of the pragmatic test of truth, whether it works or not, in the uniquely American tradition of William James and Charles Peirce. Hackers used that philosophy to change the world by making insanely good technology. Facebook’s founder, Mark Zuckerberg, called this can do attitude the Hacker Way, and articulated five basic tenants of this philosophy:
I knew I had to act fast and take bold steps. I started with several emails to my right hand in the firm, followed by a call to my top outside review project manager. He did review and project management at the same time, which, with today’s software, was not too hard. Next I talked to my top two outside reviewers and explained my proposal. I was asking all three of them for a big favor. I wanted them to work on a contingency basis.
The plan was for them to do a review of a copy of the same three million document database, but use our multimodal software and methods. We shared the same values, so I knew they would probably go for it, that is, if they had openings in their schedule. Plus, they all owed me a few favors.
I also made a call to my firm’s current favorite vendor. They understood right away that if my plan worked, and their competitor’s software and methods failed, that it would be all over the legal press. It would start secret, but, if we won, would end up very open. It could have a big impact on the market. They knew they could not buy that kind of publicity, so they agreed to further lower their already low bulk-rate licensing for this project.
I explained to my team that they would do the same project as the eight lawyers on the Borg team. It sounded like fair odds to me. I knew my team and methods were superior to the Borg. That’s what I told my team during the final conference, where I asked everyone to make an in-or-out decision.
I joked and told them resistance was futile, that we had to stand up to the Borg and save the industry from the dreaded hive mind. They were all Star Trek fans and knew that a hive mind was a negative type of collective sharing of minds into a single consciousness where the individual would lose all personal identity, all personal volition and creativity. They would become a mere drone. I made a blatant appeal to their core values, a tactic that rarely fails with fellow hackers.
I asked if they would stand up for individual lawyer creativity and free will? Did they want the freedom to do other kinds of searches, or just be tied to machine learning? Did they want to sit back and do nothing while the Borg eroded our speciality by over-delegation to machines and random chance? The appeal to values hit home, just as I had hoped. My request was not about helping Google, or me. It was about protecting the entire profession from a painful wrong turn.
They were all in.
Client Consent
Next I went to Linda at Google, explained the plan, and asked for her permission. Openness is a key part of the hacker way, plus it was the right thing to do. This was their project and their law suit. I was just their e-discovery attorney. I told her I understood that she had no budget for this second shadow team, but since we felt so strongly about the high risk of the Borg approach, we were willing to assume the entire monetary risk of failure. Still, I wanted her to agree that if our concerns were right and the Borg approach proved defective, then half of any refunds Google received from the vendor would be awarded to us as a quality control addendum. I told her I had studied the contract provision she had told me about before and I thought it was possible. She looked it over again during the call and agreed.
Linda thought the proposal was more than fair. She also volunteered to make reasonable efforts to pursue a claim against them if that seemed necessary and warranted. She was psyched and thankful about the extra effort we were willing to make on our own dime. She understood this was more than about helping Google. I sent a confirmatory email when we hung up. I did not copy the GC. That was not my place.
During the course of our conversation we did the math on how much money a full refund might bring. I made full disclosure to Linda. I had to. Unlike a vendor, a lawyer has a fiduciary duty to their client. I explained that if Google was able to get a full refund, then our half of the refund would be more than our usual charges for the review. She was surprised by our review rates, tied as they were to exceptionally high review speeds, smart culling, minimal management, and low-cost bulk-rate software. She said she wanted to talk to me later about a few other casescoming down the pike. The risk was already starting to pay off.
Linda had no problem with my reviewers and firm getting a bonus in this way to compensate for the risks. Google could do nothing but benefit from the whole scheme. It was effectively getting two reviews for the price of one, no matter what happened. We had a deal and I had the full blessings of my client.
Good News for My Reviewers
My first move was to tell the review team and my firm that we had a deal, and that if we were right and the Borg way failed, and if Google got a full refund, then our half of the refund could result in about a forty percent bonus. I had been hoping for that, but did not want to say anything to my outside reviewers before I talked to Linda.
My reviewers and software supplier had agreed to work on a contingency without any hope of a bonus, a gesture I greatly appreciated. Even if there were many contingencies to payment, the bonus hope was good news. Still, all three of the reviewers knew that they risked getting paid nothing for up to two weeks of their time. The same time-risk applied to me to a lesser degree, as I would have to supervise a second team and make relevancy calls where needed. But I was on a salary, so my financial risk was less, so too was my potential reward. This was not about the money to me. The small portion of the bonus I reserved to the firm would be indirect gravy, not really a big deal compared to the size of the overall fee for the case.
The risk to the software vendor was also small, but they had some skin in the game too, as the set-up and maintenance costs were all out of their pocket. They also agreed to include a project manager at no cost. They knew I would not need any hand-holding.
What made the contingency risk worse for everyone is that we did not know exactly how long the review work would take. It might take several weeks to complete, maybe more, depending on what part of the bell curve the truth lay. If it was on the left side of the peak, and the yield was on the low side, then the three reviewers might finish in less than 40 hours a piece (total 120 hours) and the bonus would be more that 40%. But if it was closer to the 31,000 documents side, more relevant documents would likely be found. That would slow and extend their review. In that case we could all be in for a long haul and, assuming we got paid at all, a lower bonus, or even no bonus. Based on the spot projection it would probably come in at around 17,000 relevant documents, where the bonus would be about 40%. We were not too worried about the worst case scenario.
All of us involved in this contingency deal were lawyers. We all understood better than most how risks and probabilities worked in review projects. I sent out the review contract for everyone to e-sign and return.
Deep Into Borg Territory
We were now entering the peak of the search project. The Borg team of eight drones had been reviewing machine selected documents for over a week. They were averaging 50 documents per hour, which was typical for them. They were able to review eight hours a day a piece, allowing ten-hour shifts to do so. That meant 3,200 documents per work day (8*50*8=3200). After five days of work and 320 hours of review time (they took the weekend off with my approval), they had reviewed 16,000 documents (3200*5=16000). Out of that total, only about 2,000 were coded relevant. There were proud of a job well done. They expected to review another 16,000 documents, at least.
They were paid by the hour and always hoped a project would go on longer than expected. They lacked any real motivation to work hard. They seemed perfectly happy to let the computer do all of the searching for them. They seemed satisfied not to have to think, or do anything but code what was put in front of them. Running searches on their own to find good documents was the last thing on their little hive mind. They did not care how many relevant documents were found. It was all the same to them. They were a perfect Borg review team. Just did what they were told.
I was deep into the hive mind at that point. They thought I was one of them. They did not know that I was an infiltrator. I smiled and nodded at their reviewer consistency quality control tests. They thought that satisfied my only concern of inconsistent reviewer coding. But I knew more than I let on. The truth is, I was more convinced than ever that even if all eight of their reviewers were consistent, a representation about which I doubted very much, their approach would still fail.
At this 16,000 document review point they had completed 16 iterations of machine training. They would review exactly 1,000 documents per round. Not a document more or less. I would be sent a report at each iteration point. The number of relevant documents found in each round of 1,000 would vary widely. Sometimes they would only find 4 documents. This is the number of documents you would expect blind chance to happen on in a document collection with a relevance yield of only 0.42%.
Many times several hundred of the thousand documents the computer put forward for the review team would in fact be relevant. Over the 16 rounds they averaged 125 relevant documents per round (2000/16=125). That was 12.5% of the 1,000 documents the Borg software selected for review. Not a particularly good overall precision rate. Still, they pointed to these indicators as positive for quality control. The average yield of 12.5% relevant was almost 30 times higher than a random chance yield of 0.42% (12.5/.42). That is what they focused on to try to reassure me.
In addition to the daily reports I’d be notified throughout the day of any documents that were grey area for them, where they were not sure about relevancy. I’d look at the document, make a decision, and let them know. Every now and then I would have to ask my trial lawyers for a final ruling. They had endured many extended discussions with opposing counsel regarding relevance and were the final arbitrators for our relevancy decisions.
Sometimes I would give the reviewer drones a long explanation on relevance, but it usually was not required. At first I received a lot of relevancy questions from the Borg eight, but after a while that died down to only a couple of questions a day. I figured that would happen and was not impressed. I was glad that I had a second secret review going, secret at least from the first team, the Borg team. I expected that it would become very public soon when the Borg approach crash-landed in this big case.
I would not share my decisions or the documents found by the Borg with my shadow team, but made sure my decisions were consistent between the two teams. I was doing double work, but the extra security made it worth it to me. I would never rely on that Borg team alone.
Checking In With My Team
It was 5:00 PM EST, time for the daily video conference with my personal review team. This small band of three had taken to calling themselves the Federation. I was in Florida, and my lead reviewer in NYC, the same time zone. But our two other team members were in Northern California and Hawaii. The three of them worked out a review schedule where each worked in a 16 hour sequence, six hours a piece, with one hour of overlap for continuity. That way we could keep a near continuous operation going with just eight hours off for rest and quality control. Their rest was my work to look for anomalies. I did the same quality control for both teams as far as the software allowed.
This linear like schedule made it easier for us to coordinate the machine learning sessions – the iterations – and run linear quality control on consistencies. We had a lot of other secret sauce going to maximize efficiency and quality.
The three Federation reviewers had worked at total of 90 hours that week, 30 hours per reviewer. During that time they completed only five iterations, not sixteen like the Borg. Yet they had already found more relevant documents than the Borg group by a factor of two. Just over 4,000 relevant documents had been found with over thirty different document types. Keyword and concept searches proved to be big contributors. The first seed set was well populated with a diverse group of relevant documents.
In days one and two of the project all my reviewers did was multimodal search to prepare a large seed set to start the training. They found over 1,500 relevant documents for the seed set. We did not begin to use predictive coding until day three. The Federation did two large iterations the third day. They did three more machine training sessions over the next two days.
It would take the host-computer a few minutes to do the calculations, sometimes as long as thirty or forty-five minutes. My reviewers would plan their breaks around that. My Hawaii reviewer claims she would often review on the beach and go surfing on the breaks. I did not care how or where they did their job, just that they did it well. I had no worries about that with this group. They used the software chat feature regularly and kept their coding pretty consistent. They had worked together many times before and did not hesitate to ask me for rulings. We all knew that the hive mind for relevancy was a good thing, so long as it did not involve total assimilation.
I could see from the visual graphs in the software reports that the Federation’s precision rate was increasing. In other words, the rate of relevancy identification was already improving in each round. That compared with the Borg approach where after 16 rounds the precision was still going up and down, with extremes and no patterns. At least not yet. In some rounds they would find many relevant, and in others only a couple. There seemed to be no rhyme nor reason to it. The Borg showed no improvement so far, but I knew its AI would eventually catch on. That soon they would start to improve precision like we were already doing.
The Federation’s accelerating precision was beginning to slow down their review. The fifth day was the slowest of all. As the machine grew smarter and served up a higher percentage of predicted relevant documents to code, the review rate slowed down. That is because it takes longer to identify and code a relevant document than an obvious irrelevant document. The percent of obvious irrelevant was growing less with each round.
All told my shadow Federation team had managed to find 4,000 relevant documents by a manual review of 10,000 documents. That was a precision rate of 40%. This was terrific, especially as compared to the Borg’s precision of 12.5%. Over the five days the three reviewers put in a total of 90 hours of work (30 hours a piece). That put their review speed at 111 files per hour, which was still over twice that of the Borg rate.
Comparative Analysis
Most of the time when drilling through irrelevant documents, especially at first, my reviewers attained speed bursts of up to 500 files per hour. They were, after all, the best in the business. Each of them liked to review and were all star graduates of e-discovery team training. But the overall rate averaged down to only 111.111 files per hour. That was because after the initial seed set, when the software started serving up documents for review to improve its training, they would break from mere linear review of those computer selected documents, the Borg way, and do their own side searches from time to time. They were allowed to think, no, encouraged. I trusted them, their judgment and search skills.
They were doing multimodal review as they deemed necessary to find more relevant documents for the machine training. The occasional break from review to search slowed them down, but allowed them to find and retrieve many more documents than if we relied on the computer alone.
This is what I called hybrid multimodal review. It was multimodal in that it used many search techniques, not just predictive coding, and it was hybrid in that it used human intelligence from highly skilled attorneys not only to code, but also to guide the search. It did not just rely on predictive coding (monomodal) and just rely on computer intelligence (fully automated) to do everything but make relevancy determinations. That was the Borg way, a method that my team and I felt rather stupidly minimized the skills and intelligence of the human reviewers.
The key statistic for comparative analysis is that so far we were able to find 44.44 relevant documents per hour (4,000 relevant documents found divided by 90 hours of review work), whereas the Borg approach had only found only 6.25 relevant documents per hour (2,000 relevant documents found divided by 320 hours of review work.) So far the Borg approach was only 14% as efficient as our multimodal (6.25/44.44).
The Borg approach had found 2,000 relevant in reviewing 16,000. The 2,000 relevant found were in twelve different document types. That represented a precision rate of 12.5%, a rate they bragged about as 30 times better than chance (0.42%). They did not know, as I did, that the multimodal approach had a precision rate of 40%. This was over 95 times better than a random chance precision and 3.2 times better that the Borg (40/12.5).
Aside from the obvious inefficiency, the lower number of document typesis what really concerned me. I was convinced that, despite their claims to the contrary, the irrational blind-chance Borg approach would miss too many outliers, maybe even miss obvious documents. I expected the Borg methods to increase the number of relevant documents found as the iterations grinded on. But I didn’t think they would expand and change the document types after a certain amount of training. Random chance can only take you so far. I knew human intelligence would win in the end.
Dark Times for the Federation
Although things appeared to be going well, I had to tell my team that so far the results were nowhere near good enough for us to get paid. I did not give them specifics, nor share particular documents with them. I had to keep them independent. But I gave them the big picture, that even though we were doing over seven times better than the Borg on a work-hour per relevant document found basis, and even though we had found twice as many relevant documents, the differences were not enough to prove that the Borg review was defective. It just proved it was less efficient search. That impeached their advertising, that’s for sure, but did not constitute a failure, nor prove defective software. If this kept up, we would lose.
I was beginning to doubt myself. Maybe I was wasting my team’s time and my own. Maybe the Borg approach was not as bad as I had thought. Or maybe it was bad alright, but the email evidence in this particular case was so unimportant, that low recall would not make any difference? What was I thinking taking a chance like this? How would I ever make it up to my team if I ended up wasting their time for two weeks? I knew I would have to come out-of-pocket to make it up to them. I wondered if they’d just accept free airlines tickets to somewhere and still work for me again.
I was getting tired of supervising and QCing two teams at once. I was working this around the clock, letting a beard grow to save time. I was afraid that if I lost, the Borg vendor would grow stronger. I could be assimilated myself, along with my whole team. More and more clients might insist we use their software and methods. Imagine the boredom of drone existence, just reading what the computer told you to? Not using independent thought? Not using any other kinds of search? That was not for me. I’d quit first.
I had to tell my team that things looked bad, that there was no clear proof of defective performance. They knew that was required by the software license agreement for Google to get a refund and them to get paid. Neither review team had found a document considered highly relevant. That was the most powerful evidence of effectiveness we knew. We had to find hot docs or risk failure, maybe even assimilation.
I admitted to my team that I might have miscalculated. Maybe there was nothing in Google’s email of any importance to the case? They reassured me. They said one or two of the custodians were very candid in their email. They expected to find something hot any day now. I think they were just trying to cheer me up and keep up the esprit de corps. I was not convinced.
I told them that the documents they had found so far, that the Borg had not, were not all that earth-shattering. The evidence differentials showed superiority, but not defective search under the vendor’s agreement with Google. Also, we always knew that the Borg approach would start off slow and pick up steam with more iterations. Our early lead was bound to narrow in the second half. I was down.
This was bad news for my shadow review team, but they kept boasting just the same. They knew that to get paid the Borg team would have to make a big mistake. Either that, or they would need to do something great, and catch something important that the Borg missed. They were well motivated, that’s for sure. I had to hand it to them, I did not hear a single complaint. They were a terrific team. Called me gloomy. I started thinking to myself about worst case scenarios. I could not ask my firm to pay the shadow team anyway. This was my idea, my risk. It would have to come out of my pocket, but at least it would be a personal tax deduction. I’ve had to do that before.
I finished the meeting with my Federation friends by responding to their daily relevancy questions and comments. I gave the same answers to both groups, both mine and the Borg, but I did not suggest new documents from one side to the other. I had to keep it squeaky clean and be impartial for the defect clause to be effective. I papered my file with detailed status reports, but not literally of course. I hated paper. Hated this project too. These were dark times for the Federation team.
Day Nine: Deeper Into Dark Matter
We were beginning day nine of the search. The Borg under Siri’s leadership completed their 24th round of machine training last night. In the past three days they had reviewed 8,000 documents. Each of their rounds was always 1,000 documents, some random selected, some selected by the growing artificial intelligence of the software. It was not all random like some Borg type software. Just the initial seed set. In the eight days of review the Borg drones had coded a total of 24,000 documents.
The Borg always worked 8 hours of review time per day. With 8 reviewers that meant 64 hours per day of review. Over the last three days, with 192 man hours, they had reviewed 8,000 documents in 8 iterations. That meant their average review speed was 42 documents per hour. This was significantly slower than their average rate of 50 files per hour in the first five days of the review. The reviewers were slowing down their pace because the Borg software precision was starting to go up, meaning the reviewers were presented with more relevant documents to review.
After 24 rounds of machine training they had now identified 6,000 relevant documents out of the 24,000 reviewed. They had coded 2,000 relevant in the first five days, and another 4,000 in the past three days. Their speed of relevant identification was starting to pick up as I had expected. The number of different category types was also increasing rapidly. They were now up to 25 relevant document types. (They had only found 12 at the end of the first five days.) I was getting more and more depressed as I read the latest daily reports. Still, the Borg had found no smoking guns and only a handful of documents from their 6,000 relevant were even moderately interesting to the trial team.
My morning report from the Federation team showed they had now reviewed a total of 15,000 documents and found 10,000 relevant. That was an excellent 66.66% precision rate. This compared to the 24,000 reviewed by the Borg and 6,000 relevant, a 25% precision rate. In the last three days the Federation team had reviewed 5,000 documents and done 7 more rounds of machine training. (In the first five days they had only done 5 rounds, but the first two days were on the seed set, and they often had shorter rounds as the computer became better trained.) That made a total of 12 rounds of machine training as compared to the Borg’s 24.
The Federation file per hour review rate had slowed down as their precision increased, just like the Borg team. In the last three days it was only 93 files per hour (18 hrs per day, times 3 days, equals 54 hours; divided into 5,000). In their first five days they had averaged 111 files per hour.
What was really depressing was that the federation team had still not found any highly relevant documents. They had found more interesting relevant documents than the Borg, and were now up to 50 different types of relevant documents, but again, nothing earth-shattering.
Still, there was some hope because the Federation team continued to find new types of relevant documents not seen before. When it is all becomes just more of the same, and new types don’t appear, then you know you are probably near to the end of your search. So there was still hope we would come across a document that was not only new and different, but also powerful.
My three-person Federation team had been working their usual 6 hour review days for a total of 54 hours over the past 3 days. They were feeling pretty good about their work because the total number of relevant documents they had found was now up to 10,000. They knew they were now on the inside of the bell curve of probable relevant documents. They had moved to the right of the 8,159 document set point, but they still had a long way to go to the spot projection of 17,133 relevant documents. That had the highest probability of occurring.
They had hoped to find more relevant than that by now, but then again, they also knew the relevancy identification was probably going to accelerate quickly before it gave out entirely. By that I mean a point where the computer stops finding any new document types and slows way down in relevancy. My reviewers were now thinking that this corpus was not dense at all, that the yield was probably to the left of the spot projection at the top of the curve. There were beginning to predict it would likely come in at less than 15,000. We would always guess like that near the end of a project. Sometimes bets were made. Not in this review. There was too much at stake to be distracted with project metrics gambling.
The 10,000 found by the Federation was a pretty good result for eight days. It compared very well with the Borg’s results of 6,000 relevant documents found, but the differences between the teams was decreasing. At the end of the first five days the Federation had found 4,000 relevant documents, compared to the Borg’s 2,000. So the Borg were catching up.
Our efficiency measure of the number of relevant documents coded per man hour of review was still much higher than theirs. We were finding 69 relevant files per hour (18 hours of review per day, times 8 days = 144 hours; divided into the 10,000 documents found, equals 69.444). The Borg rate was only 12 relevant files per hour (64 hours of review per day, times 8 days = 512 hours; divided into the 6,000 relevant documents found, equals 11.719.). That meant we were 5.75 times more productive than the Borg.
But I was not smiling. We were just proving Borg inefficiency in low prevalence datasets, not defectiveness. We were winning the battle, but still losing the war. I got a dark feeling that day nine was going to be bad.
Federation Team Strikes Gold
I was wrong about this being a bad day. Just two hours later I got a call from Mike, my reviewer in California. He was excited, talking real fast for a Californian. He explained that he had just started his day and was reviewing the 26th round of machine training. It happened when Mike was about halfway through the first set of 200 documents selected by the computer as likely relevant. An email popped up that made Mike come to a dead stop.
McDonald33@yahoo.com.
Sent: Friday, February 01, 20__ 3:45 PM
To: ♦♦♦♦♦♦@Google.com
Re:
F G
We can just take it from them.
If you deliver as agreed, you’ll be sit for life
Steve
Mike knew that insiders often referred to Google as just G. He also knew what F and FU meant. Still, he was surprised. He wondered if maybe his medicine was off, or something. The Steve here might be Steve McDonald, a notorious V.P. of China Space who was frequently mentioned in our notebooks. The recipient of the email was some kind of engineer at Google. He knew what the sit for lifemeant, even though there was an obvious type. It meant big money.
Mike double-checked the players list, and ran several near duplicate and family addition searches. He also ran concept searches and similarity searches. This one email led to several others. When he was sure he had the whole chain and the related web of documents, Mike called me. He had put the whole document-set into a special folder called WE_WIN. Mike was a very positive person. I hoped he was right.
I started looking at the docs while Mike talked. I speed read through all of them quickly and then started to ponder. Mike was excited, but said he was not 100% sure of the importance of his find. The emails were kind ofoff the wall, he said. But, it sure looked like industrial espionage to him. He knew that, at the very least, these documents all crossed the line into the Highly Relevant category. They were the first such documents we had found. We had the other two reviewers on the phone by then.
Mike explained what happened again while I studied the emails he found. The main one was an email from a personal account outside of corporate. The address was: McDonald33@yahoo.com. The email was just signed Steve, but it had to be Steve McDonald, the notorious V.P. of China Space. We expected Steve to be their star witness.
The other emails we found with multimodal search filled in the whole story. I am sure you have heard it all before. Typical corporate espionage and fraud. Plus the usual code-words to hide their communications. Keywords would never have found these. We had to give credit to the machine for that find. Although Google had suspected trade-secret theft for several weeks, they did not include it in the counter-claims. They had no evidence intellectual property had been taken, just suspicions. A search for evidence of internal employee fraud was not even part of our search instruction manual. But my reviewers are always on alert for stuff like that in corporate email collections. Unlike machines, they were on the lookout for anything out of the usual.
Multimodal hybrid machine learning had found the gold, the hot emails we needed to defeat the Borg review team. Based on our input, and using our multimodal hybrid methods, the software we were using had acquired the intelligence necessary to recognize these emails as highly relevant. The computer had seen a significant connection between the Google employee and V.P. McDonald., a connection that none of our other searches had uncovered. Our method had worked to find the golden needles, and so far at least, the monomodal fully automated approach had not.
I am not at liberty to disclose the name of the Senior Engineer at Google who sold out. We anticipate several civil and criminal investigations and have a hold in place. Google had previously collected all of the engineer’s email as part of preservation. They had a crack internal e-discovery team and followed best practices. We had also included the engineer in our first round of review. We did so based solely on his place in the upper-hierarchy. Plus, his particular position gave him almost full security access: level nine. Everyone with level nine security clearance, all five senior engineers, were included in our investigation.
The Google engineer did not have any direct involvements with the transactions at issue. We had no reason to give his ESI any special scrutiny, in fact, he came up as a B player in our first preservation list. The official records showed that he had no involvement in the China Space deal that blew up. He was, however, copied on key emails and documents sent to upper management, so we decided to make him a Class A. That meant we looked at his ESI in the first round.
I was very happy by the find. I told the team I was sure this was the big break we had been hoping for. This Google engineer fit the classic profile for a corporate fraudster. His other email made it clear that he was primarily thinking about retirement and travel, especially in Asia. He also seemed to have few family members or much of a personal life. I told my reviewers that these documents looked liked smoking guns to me. Not only that, the guns were all good for us and bad for them. That always made the find of strong evidence much more enjoyable.
I explained to them that although I thought the documents were great, I would not know for sure how great until I got a response from my whole trial team. I especially wanted to hear from the attorneys who knew the witness statements from the blitz they took last month. They had a better grasp of the merits of the case than I did. Not really my job. I just had to get the facts. They had to win the case. But I did know that these emails would make their job a lot easier.
I ended my meeting with the reviewers by exchanging thanks and congrats, and said to take the rest of the day off. They would still have a lot to do over the next week to finish the search project. There could be more hot documents like this. I also reminded them the Borg could find these same hot documents too, and maybe more. Although I shared with them my belief that this was unlikely.
Feeding Steak to the Sharks
Next, I had the great pleasure of telling the trial boys what we had found. I prepared a short memo with the five hot emails attached and emailed the memo to the whole trial team. I told them that although odd, these emails looked very important to us. What did they think? I have found it is best to let them draw their own conclusions.
Less than fifteen minutes later I got my first response. This was a home run with the bases loaded. Each trial member checked in and agreed. Everyone speculated as to what went on between the Google Engineer and the China Space Vice President. It looked pretty obvious to us, and we knew most judges would see it that way too.
They each came up with different ways the emails could be used to destroy China Space’s case. They then started plotting on the best ways to spring this new evidence on the other side. They wanted to use it to try to bring this case to a rapid end. They expected to be able to get a settlement where China Space paid Google for full and complete releases. The emails were even more powerful than I had anticipated. The trial team was almost unanimous in thinking that they were sure-fire SJ material. The one neigh sayer urged caution until we see what other email might be in the system. Still, the energy level of the trial team was high. We were on a flat fee, and we had already gone past the pleading stage. Settlement now could be very profitable for both our client and the firm. Notice I said client first, although that was not what I was thinking.
There was a preliminary mediation scheduled soon. The timing was perfect. The first meeting was scheduled for after our review had to be complete, and after the sample sets had to be disclosed per the QT protocol, but before the actual production of documents. We would not have to disclose the killer emails before the mediation. However, both sides knew that some key document exchange at the first mediation hearing was typical and to be expected. That helped make the first mediation hearings more interesting. The mediation metrics for last year show a 33% settlement rate in early mediations for corporate suits with $10,000,000 or more in dispute. Smaller suits always had a lower settlement rate. Google and China Space both knew these same statistics and so both sides would come armed for bear.
Google Gets the Good News
We all then got on the phone with Linda. She was thrilled. She asked me which team had found it. She was even happier to learn our Federation had found them, not the Borg. Linda then listened to the trial team talk on the merits, implications, strategy, and possible settlement range for the upcoming mediation. Later that day I called Google and had a one-on-one with Linda to discuss how to handle the Borg team. We agreed to be very cautious and made a plan to create clear proof of the Borg team default.
The Borg could still find these same emails, of course, and in that case we would not have proof of defective software. But I doubted that would happen. It would be blind luck to turn up these handfull of emails in a random sample of four million Documents. I knew how their AI was training by the reports. I doubted the AI they were creating would ever see any connections with the hot, but odd emails we had found. Linda and I finalized our plans for the end game of the Borg battle. Resistance wasnot futile.
I then shared the good news with some friends in my firm and went home and took a nap. Victory or not, the weariness of the past nine days was taking its toll.
Distractions Along the Way
The rest of the week was a blur. Two emergencies came in from opposite coasts. One was simple to fix with just a few hours intervention. One was a real mess and clean up took days to plan and implement. Somewhere in there a new review project came in.
I needed to do some ECA on the new project as the local liaison was unavailable. I started the project by creating a custom legal Culling plan for this data set. I also did some of the easy and quick culls unique to this data. Then I pushed the auto-cull button the software developers had added at my request. It automatically identified and segregated all ESI that could not be searched by machine training. That basically means the elimination of most non-text documents. They are either searched separately, or ignored, depending on the facts in dispute.
I always found these sixth-step challenges to be particularly interesting. Or as some of my reviewers would call it, more crap with the top green square, the one on top of the Review Column. They had not yet memorized all of the numbers. But they did know I was not talking about the kind of tech-culling that vendors do, such as deduplication and de-Nisting. I was talking about legal judgment based culling.
I would usually do my culling by using both judgmental and random samples, plus some quick searches. I would also look at the data in various ways using my software’s visual analytics. Those rapidly improving graphic features are what I once heard Jason Baron at Legal Tech call the next big thing in computer assisted review. I was already looking forward to the next generation of software that I saw in a recent private showing. Hang on to your hats folks, the software is going to get insanely good fast.
As I coded a few documents during the ECA for the new project my software’s machine learning functions, which my vendor calls Intelligent Review Technology, automatically kicks in. I could start displaying the probability rankings immediately, if I wanted to. But usually I would wait until I felt, or we felt if it was a team effort, that some kind of critical mass had been reached. The probability rankings were not too helpful at first because there was only a small amount of coded documents. Plus, I did not want the ranking distractions. I did not think the rankings would subconsciously influence my evaluation, but I knew that several information scientists disagreed. There was some interesting research going on now on that topic. Ultimately I think it is a matter of learning, understanding, and experience, just like everything else in life.
Machine Training Workflow
While dealing with these new projects I continued to supervise both of my review teams in the Google project. The Federation team finished first, of course, by three days. Both teams had continued until they reached a point where the number of new relevant documents found was sparse. Another indicator was when the only new documents the machine would find were of the same relevancy type, just more of the same. At that point you know you are probably done. You never know for sure, of course, but that is what the final sample test was for. That is step number six in the standard predictive coding circle chart. Although it is often called the Random QC Test, to be technically correct it is a Quality Assurance test, not control, but that subtle distinction still escapes most people.
Step six will either confirm the reasonability of your review efforts, or not. If you passed, you are done with the iterated searches, and can move on to step-seven in the predictive coding flow chart, Proportional Final Review. In the EDBP, which breaks it down into greater detail, the next step after Computer Assisted Review is Protections. Step Eight.
This has been one of the hardest steps in e-discovery for years.
If you failed the final quality assurance test, which meant your scores were too low or a hot document was missed, or several new types of relevant documents were found of some significance, but not necessarily Hot, then you had to do further rounds of machine training, not to mention other types of search. You would have to return back to the iterated steps Four and Five in the predictive coding circle. My reviewers call that a Bad Robot moment. So far it has only happened to us once.
There are metrics involved in the final random test, and the whole process is much more objective then it ever was in the early 2010s when advanced search first started to catch on. There is no question about that. But there could sometimes still be a tremendous amount of subjectivity involved in the evaluation of the tests. We would often have to seek judicial guidance, even though the legal standard was fairly low, mere reasonable efforts, not best practices. Lawyer argument remains a constant, even as our topics of argument rapidly change. That is the essence of our adversarial system of justice.
The Final Search Reports
Both teams finished the final random sample testing at the same time. We were all afraid that the Borg might get lucky and find our super Hot documents, but they did not. Big sigh of relief. The Confusion Matrix scores had the Federation team ahead of the Borg by 20% to 30% in all eight categories. So did the other measurements we used. It was still all smiles on my team. My reviewers had worked hard and they were glad it was over. They had high hopes they’d get paid, and paid well.
CONFUSION MATRIX
Truly Non-Relevant | Truly Relevant | |
Coded Non-Relevant | True Negatives (“TN”) | False Negatives (“FN”) |
Coded Relevant | False Positives (“FP”) | True Positives (“TP”) |
- Precision = Positive Predictive Value = TP / (TP + FP)
- Recall = True Positive Rate = 100% – False Negative Rate = TP / (TP + FN)
- Elusion = 100% – Negative Predictive Value = FN / (FN + TN)
- Accuracy = 100% – Error = (TP + TN) / (TP + TN + FP + FN)
- Error = 100% – Accuracy = (FP + FN) / (TP + TN + FP + FN)
- Fallout = False Positive Rate = 100% – True Negative Rate = FP / (FP + TN)
- Negative Predictive Value = 100% – Elusion = TN / (TN + FN)
- Prevalence = Yield = Richness = (TP + FN) / (TP + TN + FP + FN)
But what really counted was my weighted Elusion Test. The Borg had missed all of the Hot game-over type documents that we had found. But the Borg Queen, Siri, did not know that yet. She thought she had done ok. She still had no idea there was a shadow team that not only beat her in every category, but blew her away in what really matters, the battle of persuasion.
I couldn’t wait to see her face when she found out at the Mediation that she had missed all of the Hot docs. Normally vendors don’t attend a mediation, but of course Siri considered herself special. She had wiggled her way into attending. Apparently she had convinced the GC that her friendship with the mediator might help. How naive. Our trial lawyers didn’t care because they love an audience. This would be their time to shine. Another pretty face would just make the show even better. These trial lawyers were such hams.
The Mediation
After introductions the distinguished looking mediator gave his usual opening speech: I’m so neutral, I’m so great, with my near divine help you are going to settle your case and settle it today. He did it well, but I’ve heard it so many times before.
Then it was plaintiff’s counsel’s time to speak. She was good. Her allotted thirty minutes turned into forty. It was as polished as any closing statement at trial that you are ever going to hear. She even had me going there from time to time when she would make a particularly persuasive point. Then I would remember the goods we had on China Space and breathe a little sigh of relief. Without our Hot docs this would have been a hard case for Google to win.
China Space had a few surprise documents of its own. Their attorney brandished them in her hands with a paper flurry. Our lead trial man looked them over when she solicitously handed him a copy. He said quite loudly, intentionally interrupting, yeah, we’ve seen these all before. That was a lie of course, we had only seen one. The others were by third parties and could not have been in our collection. There was no way we could have found them. They were really not all that important, especially compared to what we were about to lay on them. The fact that these documents were all they had prompted another big sigh of relief from everyone on our side.
After forty minutes of smooth talking their version of the facts, China Spaces’ attorney finally stopped. At that point we all knew, except for Siri, that victory for Google was now certain. Our client was fairly clean, just as they had told us all along. We all sat back in our chairs, relaxed, smiled and waited for our man to begin. The other side could sense that something was up. Our body language was projecting a confidence level that went through the roof. The senior attorneys on the other side already knew they were screwed. Their young associates look confused. They had that deer in the headlights look.
My head trial lawyer did not disappoint. When he got to the document slap down he had all eyes on the screen. I had the pleasure of advancing the slide to show the key email. It was magnified five times its normal size. You could sense a chill sweep through the other side of the table as they saw it and the additional Hot documents that followed. Ah, victory is sweet. Sports is nothing compared to high stakes litigation.
Of course, we could show no real emotion during the joint session. Still, the older lawyers among us were well-versed in the subtle nods, the looks that sent the message to the other side that we knew we had them by the balls.
Our lead attorney then began to squeeze. The merits of Google’s twenty million dollar counter-claim were extolled. Then he went on about the new claims that would be filed tomorrow if the case did not settle today. He even had a copy of the amended counter-claim and gently set that down in front of opposing counsel. She did not even look down. Instead, she held a practiced, slightly defiant, Mona Lisa smile that gave nothing away. But, her magnificent acting did no good. The experienced lawyers among us on both sides had all been there before. We all knew that China Space was totally screwed. At this point it was only a matter now of how much China Space would have to pay. We had seen their quarterly statements and knew they were flush with cash.
After the speeches the parties adjourned into separate rooms. The mediator then began his shuttle diplomacy. He’d go back and forth from one room to another, and sometimes huddle with just some of the attorneys in another room. Everyone on our side was giddy, we did not have to keep a straight face anymore. Everyone, that is, except for Siri. I could tell that she knew she was in trouble, she just did not yet know how deep. No one was talking about where the exhibits came from and she did not ask.
Let the Celebrations Begin!
The mediation ended ten hours later. It was dark out and we all wanted to go home. But we had to stop by a restaurant first. We had to celebrate the Ten Million Dollar settlement Google had just signed off on. How’s that for a defense verdict?
Payment would be made by China Space to Google’s attorneys within seven days. At that time the mutual releases signed today would be released from escrow and all of the claims and counter-claims dismissed with prejudice. A confidentiality agreement sewed it all up tight.
It was a good day for Google, and they were not shy in heaping praise on my firm. The trial lawyers had done a great job, so had the mediator in hammering home the adverse publicity dangers. He made it seem like ten million was getting off real cheap. Little did he or China Space know that we would have taken seven. We had lied to the mediator all day and told him ten was our absolute bottom line. He was an experienced mediator. He probably knew we were lying, but that’s all part of the game. Too bad he didn’t join us at the party. It was not a swank place, but it sure had its share of interesting characters.
Yes, it was a good day. Good for everyone on Google’s side except for Siri. She had not joined us. On our way out I thought I saw her in the mediator’s parking lot talking to a China Space engineer. She was showing him her special glasses. She was probably taking credit for finding the key documents. That would be the natural assumption. I think she was scouting out her next big project. A natural-born saleswoman that Siri. Too bad her company’s software and methods were flawed.
The Borg Are Defeated
I called Linda the next day at Google. Asked what she wanted me to do about Siri and the refund. She said just deliver the last reports to her and she would take it from there. She was right. It was better for me to stay out of that entirely. I trusted Linda. I knew she wanted to get that money back as much as I did. For once I did not have to do anything.
A week later I got the call. Linda says it’s done. Siri’s attorney had been relatively easy to deal with. The vendor seemed glad to be able to keep 25%. They knew they had screwed up. Google got 75% of their money back. That was outstanding! It meant that my team on a contingency would not only get paid, but they would get a nice bonus too. Before she went on about the details I asked Linda if she’d mind telling my review team herself the good news. I knew that would mean a lot to them. She agreed and we set a time for that afternoon.
At the call Linda laid praise on us all and told the reviewers how it turned out. Then Linda told me something I didn’t know. She said she had checked the flat fee agreement with my law firm. She found a provision in the discovery clauses that provided for a special bonus payment at the discretion of the client, which meant her, Google, in the event that documents were discovered that led to an early and favorable resolution of the case. She had talked it over with the GC and they had decided to exercise their discretion and make the bonus award. I had forgotten that provision, and frankly, did not think it applied because we had not been selected for the review services, the Borg company had. It was a very generous interpretation of the contract.
Then Linda told us the amount of the bonus award to the reviewers and my firm. That put the total fee payment to each of the reviewers at just under $40,000 a piece. The reviewers went crazy and thanked Linda profusely. They had never received such a big payday, and for less than a month’s work. Few people had. My firm would also be pleased, but it was comparatively small potatoes compared to the rest of the fees. Ah well, at least my reviewers got some immediate gratification. Mine, if any, would have to wait until the end of the fiscal year.
Conclusion
Two days later I get a fed-x package in the morning. In it were two first-class tickets to Honolulu and a note. Turns out my Federation review team was going to have a big party this weekend in Hawaii. I was invited, all expenses paid, along with a guest. A nice gesture, and one they knew I could not refuse, especially since they had already bought the tickets. I changed all our plans and got ready for the trip. It’s not every week that you vanquish the Borg. I’ll let you guess who I took with me.
[…] be somewhat limiting, so for a more snazzy, science-fiction effort that I wrote after this, see Journey into the Borg Hive. There I describe a legal search project set in the not-too-distant future where two competing […]
[…] Journey into the Borg Hive: a Predictive Coding Narrative in science fiction form. […]
[…] version of the battle between these two competing types of predictive coding review methods, Journey into the Borg Hive. And you thought all predictive coding software and review methods were the same? No, it is a […]
[…] version of the battle between these two competing types of predictive coding review methods, Journey into the Borg Hive. And you thought all predictive coding software and review methods were the same? No, it is a […]
[…] Journey into the Borg Hive: a Predictive Coding Narrative in science fiction form. […]