This is a continuation of Journey into the Borg Hive: Part One of an e-Discovery Sci-Fi Saga. You need to read part one of the story for this continuation to make sense. This series is about a legal search project set sometime in the not too distant future. For a look at fifty years from now check out A Day in the Life of a Discovery Lawyer in the Year 2062: a Science Fiction Tribute to Ray Bradbury.
Review of the 2,401 Sample
I started the project by review of all 2,401 samples at once, rather than taking them in sets of 200 as the vendor recommended. That was for amateurs. I knew I could take advantage of display sort alternatives by doing the entire group at once. I would use those methods to quickly knock off certain files with bulk-coding. I called that my extended spam-type culling. (Regular spam had all supposedly been caught by the client’s spam filters, but many always get through, even at Google.) Spam for me meant any type of file that could not be relevant. It was like a custom de-Nisting. I also knocked off most of the newsletters that way. Yup, even Google employees still subscribed to a few, not to mention Amazon orders.
Before I began the bulk coding I looked over the whole set using various visual displays. I looked at the most popular file types, email subject headers, and the like, including the obligatory user display with the connecting lines. I also glanced at the time line display. Then I sorted by file size to quickly review and knock off the too-small for meaningful content files. One actually had content (I still quickly glanced at each to be sure), but it was irrelevant minimalist content none the same.
Only then did I do the extended spam, custom de-Nisting type bulk coding. After that I settled in for an alphabetical listing where I could easily see all of the similar emails and code them. I found one large collection that was some kind of group file having to do with internal IT announcements. Like most companies the IT department cluttered everyone’s email with tons of important announcements, including maintenance, upgrades and virus alerts. Second was stupid HR announcements and then the obligatory celebration emails.
After the first hour of fun work like that, where I knocked off the easy stuff and got a feel for the database, I settled in for the hard grind, the remaining approximate 2,000 emails that required individual scanning to determine relevance. I put the software into power-review display mode, the one that maximizes the display size of the document under review. Then I began to use the hot key combinations for irrelevant and train coding. Hot keys were much faster than using a mouse to check boxes and click. Mouse work involved two steps and sometime it could take a millisecond to get the cursor in the right position. Hot keys were always much faster, unless you were bulk coding. Still, occasionally I had to use my mouse to move around and see something outside of the default view.
I could do this review one-handed if I wanted, and would sometimes do so for a change, or to drink coffee or something. But I found it was faster to use both hands and keep them both hovering above the keys to tap and tag a document as irrelevant and move on. So that was my standard technique. I could code some documents that way in a second. Of course, this required very high-speed Internet. This was a necessity for me, not a luxury. So too were the latest computers I used to do the review. Sometimes I would use multiple monitors, but often I would just use my favorite laptop, the new MacBook Pro, and move around for variety; all wireless of course.
Attaining Ideal Work Flow
My favorite spot to code in the Winter was next to the pool. With global warming Florida was an especially good place to code outside and catch some sun at the same time. I’m a big believer in multitasking, but when I review, my focus in 100% on the documents and keeping my mind open to evaluate any possible relevance. The millisecond the next document enters my consciousness the mental calculus begins as to whether there is anyway it could be relevant. Sometimes I use music to help keep the concentration levels high, sometimes not.
Every now and then I would encounter a document that took a while to recognize and identify as irrelevant or not. Sometimes that would require paging down to the bottom of a document not in view, or changing the page orientation ninety degrees (although I usually just twisted my head sideways as that was slightly faster). Sometimes it would even require moving on to the next pages, to see earlier strings in an email chain, or later portions of a word doc or a Powerpoint. But that was rare. The screen was large enough to see most of the document in one glance.
I had trained myself over decades of computer use, including thousands of hours of gaming, to take in a screen all at once, and speed-read words where necessary. I am a big reader anyway, so this was all second nature to me. I also had a few highlights turned on for certain key words, and every now and then this also helped for quick recognition. Usually, however, a detailed read was not necessary. A quick glance told all.
I had read so many emails of so many different people of the years that I knew the patterns. I knew the kinds of things people would say and not say in emails, which still constitute the bulk of all my review work. I had seen it all before.
I was moving fast but I was not in a hurry. That is always a mistake. It is better to be in a very relaxed timeless state, what Csikszentmihalyi calls Flow. Sometimes the flow was strong and nothing could stop it. Other times it was weak and easy to disrupt. Sometimes even a slight hangup in page reloading would disrupt the flow.
These hiccups were a common occurrence. I was, after all, working remotely on a computer that was located over 2,000 miles away. Even though I had the best Internet connections, there would sometimes be slight delays, usually just a second on two, but sometimes longer. I trained myself to take mental mini-vacations when that happened, rather than get annoyed. That helped keep up the timeless, concentrated flow. I’d just rest with a quick mind-blank. Maybe I’d look around or rub my eyes. It was critical that you kept your mind alert and open and also important that you not over-strain your eyes.
I have trained myself over the years to take as long as needed to ID a doc, but no longer. Sometimes I would change the work flow and put electronic sticky notes on a document that I might want to refer to in the future for some reason, such as to explain my reasoning on a close question, but that was rare. I did not rush and I was always careful to be careful. I did not want to make a mistake.
This was not a race, it was a search for evidence in an important legal proceeding. If it took five minutes or longer to ponder a document, then so be it. If I had to go back and double check a document I had already marked, then so be it. Don’t go so fast that you lose your ability to stop suddenly where necessary to be sure. Some documents were tricky and hard to identify. Sometimes a document would look irrelevant at first glance, but upon further reading you would see a relevant statement. You needed to be sure of your coding before you clicked to the next document. This took a high skill level to do right. Every experienced reviewer knows that.
When I did finally run into a relevant document I would read it very carefully and think about its significance, noting its time, the persons involved, and language used. I would also evaluate its weight. If it was very important, I would mark if as Highly Relevant, but this is rare. I did not find any hot documents in this random sample review.
I also trained myself over the years not to be distracted by interesting irrelevant documents, typically emails with jokes or sexual references. Sometimes you would run into glimpses of human drama, emergencies with kids, nasty comments. It was tempting to read on, but better not to. Every now and then you also would run into a custodian with too much personality. You know the type I’m sure. They often use politically incorrect speech and are outspoken. I try just to make a mental note of the loose-lipped custodian and move on. Sometimes I break the iron discipline and spend a minute to amuse myself by reading an irrelevant email. It was rare, but happened. I knew from experience that could be a real time killer, so it usually took quite a bit to get me to linger. It is better not to get distracted in any way or start fantasizing about the content, or the custodians. That slows you way down.
I was concerned about efficiency, but the main thing in review is always accuracy, not speed, especially in the first random set review. It is important to set up the initial benchmark correctly. An inaccurate prevalence calculation could throw all of the quality assurance tests off, not to mention slow down on the machine learning. I took my time and cruised through at an average of 200 files per half hour.
I would take a break every thirty minutes or so, sometimes longer. I did not allow phone interruptions, or people interruptions, unless I was on a break. I preferred to work alone. I often wore headphones. Sometimes a break would last an entire day, either for personal reasons, like a family event on a weekend, or sometimes the break was work induced. Another case would interrupt and take priority. That is just the way the practice of law goes. I had no trouble picking back up the one time that happened here. A break for several days can throw you off, but just one day is not a big deal for zoning back into the relevancy gestalt of this case.
First Review Task Completed
I completed the review in three days, including the lost day. It took me seven hours of review time, just a little more than expected, but not much. I averaged 343 files per hour, not bad, but hardly my best time. Note I do not include preparation of my obligatory memo to file as part of my review time, nor the periodic quality control work I built into all of my reviews. That is part of analysis and is separately described in final billings. Even though this was a flat fee, we still kept track of time and task descriptions.
My concern that this collection might be unusual and have a high prevalence rate was misplaced. This was a typical low yield collection and so the review went about as fast as expected. I knew low prevalence was the norm in legal search, and it would only get worse in the future as the data explosion continued. The more documents, the larger the haystacks, the lower the percentage of relevant needles. Although the amount of relevant information was also expanding, the needles, so to speak, it was not keeping pace with the overall explosion of data. The information noise is always growing faster than the music. Plus, human limits on persuasion remained, the seven plus or minus two rule of trial practice would never change as long as the jury trial remained a fundamental right: 7 +/-2. The number of electronic communications was ever-increasing, but not discussions of topics relevant to most litigation. The issues raised in the dispute between Google and China Space were many, but few communications in any way concerned them.
Analysis of the Sample
My next step was to analyze the metrics of my sample review. My review of the 2,401 sample found ten hits, ten relevant documents. None highly so. A little high in my experience, but not unexpected because this was a big, complicated case and the trial boys still hadn’t narrowed any issues. They were waiting for the first mediation and court rulings.
The calculation required for the first baseline metric of prevalence is simple, just division: 10 ÷ 2,401 = .00416493127863. I rounded off and called it a prevalence rate of 0.42%. Since we were dealing with a data set (corpus) of 4,079,293 computer files, that meant the spot projection of the number of relevant documents that were likely contained in the corpus was 17,133. Our model of perfection was to retrieve that many relevant documents from the corpus. But unlike the Borg in StarTrek, I did not have to attain perfection. Reasonable, proportionate efforts are all that are required by human law.
To get a more exact probability confidence interval range in a low prevalence dataset like this we needed to make binomial calculations. We find that the 95% probability interval was from between 0.20% and 0.76%. Remember the center point is 0.42%. Anything more or less than .20% and .76% would have less than a 2.5% probability.
This means a document spread around our peak probability number of 17,133 relevant documents would be from between 8,159 to 31,003 documents. There was a small chance that there were only 8,159 relevant documents in the dataset. There was an equally unlikely possibility that there were as many as 31,003 relevant documents. The least likely results, which were shown on the far ends of the bell curve, were the extremes of 31,003 documents and 8,159 documents. More or less documents than these extremes would fall outside of the 95% confidence interval. They are contained in the extreme tails at either end of the curve and are shown in blue in this graph. These tails theoretically stretch out to infinity.
The closer to 17,133 the higher the probability of likely accuracy, with 17,133 having the highest probability (9.5%) of being the correct estimate.
Meeting with the Borg Queen
I finally met Siri, the notorious CEO of the vendor. I was glad it was a video conference, not audio, as she was easy on the eyes. She looked much younger than I had expected. Siri wore one of those new computer head-gear add-ons for the iPhone. It could not only project a screen interface, but it had biometric devices and neurophysiological manipulation capacity. She did seem very relaxed. She also had on some kind of matching glove. It probably had some kind of special purpose, but I didn’t know what it was.
I went over the results of the first sample search with them. They seemed uneasy about how quickly I had completed the project. The truth is, most were still stunned by my decision a few days ago to do the review myself. They had expected their contract lawyers to do both the relevance review and the confidentially protection. Most figured I would change my mind. They thought that is what the meeting would be about, not to announce that I had already finished it. But what really surprised them was the prevalence rate I found. Siri did not say a word, but seemed troubled when she heard that.
The chief scientist blurted out that he had never worked with a yield rate that low. He said they usually have prevalence rates of from between ten to thirty percent. They once had a project as low as 3%. He had never seen a prevalence rate this low, 0.42%. He thought I must have made some kind of mistake.
That led to a long discussion where I learned that they never do collections, and do not really pay much attention to that. Another vendor would always do that, or, as in this case, the client did it themselves. I asked them to find out if they get raw data collections of a custodians entire collection, or search filtered collections, such as all emails and attachments that contain certain terms or metadata. They said they would check on that and get back to me right away. We then moved on to the 16(b) hearing prep issues. That took a couple of hours.
By the time we were done with that work, a young man came on camera and said he had completed the collection research for their last three projects. They were efficient, I have to say. He found that in addition to the usual deduplication, deNisting and date range restrictions, all had some other search filter in place. Two were simple keyword search filters and another was a more complex multimodal that also included a simplistic type of machine coding. Bottom line: I was right. They were used to working with pre-culled data sets. But not only pre-culled, over-culled. The keyword searches, even though very broad, and imprecise, were still likely to omit over half of the relevant documents. Their clients had made a fundamental mistake of not feeding the predictive coding search engine all of the data. That explained why they usually had ten to thirty percent prevalence rates.
They got it right away. I could almost see them processing the implications of this on their software’s fully automated approach. I then began a series of questions, automatically switching into expert deposition mode, and quasi-hostile witness cross-exam at that. No, they had not tested their software on anything less than one percent. This was going to be a first. No, they were not sure how the algorithms would react. Then the sales guys at the meeting went into customer soothing mode and began assuring me that everything would work fine, it might just take a little longer. The scientist and the techs then joined in and started explaining very rapidly why it would not make any difference. I nodded but was not at all convinced. Don’t they think I’ve never seen a snow job before, just because I’m from Florida?
When I left that meeting my concerns regarding the reliability of the software were stronger than ever. This was likely to be a disaster and there was nothing I could do but watch. The GC would never change his mind. Linda had made that clear to me.
The vendor’s fully automated software would not work with a 0.42% prevalence dataset. It would take forever to get going by random samples, and once it did reach a relevancy vector, the scope of its recall would be arbitrarily limited. I was concerned that the random approach would not catch unusual types of relevant documents that were not included in the random selections. It could even miss highly relevant ones. That could be a recipe for sanction soup, a very untasty broth if ever there was one. I would rather drink Socrates’ wine than have that happen to my client in my case. I had to figure a way out.
A Plan is Hatched
I considered the possibility of a personal confrontation with Google’s GC to talk some sense into him, explain what a big mistake he had made picking this vendor. Then I remembered what little hard proof I had of that, yet. I also thought about how some of my partners would react to that approach. No. I had to come up with something else. I should talk to Linda again, explain what happened with the vendor. Maybe she could help. After all, we both wanted to protect the company and steer it clear of sanctions. And, neither of us trusted this vendor.
I called Linda the next morning, early, and was surprised to get through to her right away. This was the first one-on-one call I had had with her. She was much nicer to deal with. I told her the story of low yield. I was now convinced the software would miss obvious documents that a hybrid multimodal would surely find. Linda said that reminded her of something she had read in the vendor’s contract with Google, something about forfeiture of fees if the software was defective. I asked for a copy of the agreement. After that we both lamented that neither of us could think of anything to do but further paper our files with CYAs, hardly a satisfactory solution. Then I got a call about a new case and helicoptered in to help a stressed out team in California.
When I got the vendor agreement the next day I read the sales-manlike terms and provisions, most very cleverly favoring the vendor in non-obvious ways. Then I found the defect warranty provision Linda was thinking about. It specifically defined a defect by comparison with another software program running the same search under the exact same conditions. That made it virtually impossible to ever prove a defect and get a refund. But it gave me an idea. It would be a big gamble, but my friends and I would be taking the risk, not the client. It might just work to save the day.
I did not have time to think it through. The 16(b) hearing was this afternoon and I had to complete preparations for my part of the hearing.
Delusions of Duty
I had already endured many meetings with opposing counsel and I was looking forward to a meeting with a judge in the room. Their e-discovery expert was ok, but her hands were tied, much like mine, on the relevancy scope issues. The trial jockeys had final say on that. Both sides had reached a point of barely civil intransigence. The trial lawyers, most of whom looked like their counter-parts from the Eighties, were going to argue several relevancy issues at the hearing.
I knew that the behavior of the attorneys would magically change at the 16(b) hearing when they stood for the judge to enter. It was like entering a Sunday School pageant and pretending like you were an angel. I was looking forward to the new improved versions of their personalities. So too was my discovery counterpart on the other side. We did not want to go forward with our respective search projects until the relevancy issues were resolved. We hated re-dos. We were hoping for a clear focus of the issues and had about twenty-five exhibits lined up for projection on the court’s HD equipment. If our trial lawyers were really good, we had a chance of getting the relevancy rulings we needed from the Bench.
Unlike the trial boys, we discovery lawyers did not really care where the initial relevancy lines were drawn, we just needed the lines to get going. After all, the odds of any document actually making it to the final trial exhibit list were over 100,000 to 1 against. The fighting over finer points of relevance was a necessary exercise, we knew, but we also knew it would all shake out in the end. If we did our search job correctly the few hot documents would inevitably percolate to the top. Nothing else really mattered. The fine-tuning of initial relevancy vectors were not that important, except to avoid re-dos.
This is something that many trial lawyers did not understand. A few of the old and powerful discovery lawyers around don’t get it either. They were of the old-school where you Tiffed and reviewed everything and fought against all disclosure.
Many e-discovery vendors helped perpetuate these delusions. They fed fuel to the fire. They promoted over-discovery and fight-everything, not because they believed in the tactics, they knew better from experience, but because their yearly bonuses depended on it. They justified a little exploitation of their customers as part of their duty to the corporation’s shareholders to maximize profits. These vendors benefitted mightily from over-discovery. They loved a review re-do and forensic fights. They did not have a long-term view of corporate profitability.
These vendors had much in common with the old school lawyers they exploited. The lawyers also justified their over-contentiousness and over-review under the cloak of duty. The so-called duty of vigorous representation of their clients. They would say that all litigation, including discovery, is war! They were not about to compromise their duty to their clients by cooperation and proportionality. This also led to more and more fees. Everyone benefitted from this ill-informed mutual back-scratching, except, of course, for the clients stuck with the bills, and the judges who occasionally had to listen to their bickering. I understand that this has been going on since the nineties.
I was still not quite sure where Siri, the Borg Queen, fit into this, but she did not seem to be one of the old-school vendor types. She seemed bent on constrained review, and cooperation. Siri’s problem was she put too much trust in technology. But, who knows, maybe she was right? I blanked out for a few seconds, or was it minutes? I had spent way too much video-time with the Borg CEO yesterday. I needed to stop daydreaming about Siri and focus on the judge and hearing this afternoon.
To be continued …