This is a continuation of Journey into the Borg Hive: Parts One, Two, Three and Four of a SciFi Saga. This series is about a legal search project set sometime in the not too distant future. For a look at fifty years from now check out A Day in the Life of a Discovery Lawyer in the Year 2062: a Science Fiction Tribute to Ray Bradbury.
Deep Into Borg Territory
We were now entering the peak of the search project. The Borg team of eight drones had been reviewing machine selected documents for over a week. They were averaging 50 documents per hour, which was typical for them. They were able to review eight hours a day a piece, allowing ten-hour shifts to do so. That meant 3,200 documents per work day (8*50*8=3200). After five days of work and 320 hours of review time (they took the weekend off with my approval), they had reviewed 16,000 documents (3200*5=16000). Out of that total, only about 2,000 were coded relevant. There were proud of a job well done. They expected to review another 16,000 documents, at least.
They were paid by the hour and always hoped a project would go on longer than expected. They lacked any real motivation to work hard. They seemed perfectly happy to let the computer do all of the searching for them. They seemed satisfied not to have to think, or do anything but code what was put in front of them. Running searches on their own to find good documents was the last thing on their little hive mind. They did not care how many relevant documents were found. It was all the same to them. They were a perfect Borg review team. Just did what they were told.
I was deep into the hive mind at that point. They thought I was one of them. They did not know that I was an infiltrator. I smiled and nodded at their reviewer consistency quality control tests. They thought that satisfied my only concern of inconsistent reviewer coding. But I knew more than I let on. The truth is, I was more convinced than ever that even if all eight of their reviewers were consistent, a representation about which I doubted very much, their approach would still fail.
At this 16,000 document review point they had completed 16 iterations of machine training. They would review exactly 1,000 documents per round. Not a document more or less. I would be sent a report at each iteration point. The number of relevant documents found in each round of 1,000 would vary widely. Sometimes they would only find 4 documents. This is the number of documents you would expect blind chance to happen on in a document collection with a relevance yield of only 0.42%.
Many times several hundred of the thousand documents the computer put forward for the review team would in fact be relevant. Over the 16 rounds they averaged 125 relevant documents per round (2000/16=125). That was 12.5% of the 1,000 documents the Borg software selected for review. Not a particularly good overall precision rate. Still, they pointed to these indicators as positive for quality control. The average yield of 12.5% relevant was almost 30 times higher than a random chance yield of 0.42% (12.5/.42). That is what they focused on to try to reassure me.
In addition to the daily reports I’d be notified throughout the day of any documents that were grey area for them, where they were not sure about relevancy. I’d look at the document, make a decision, and let them know. Every now and then I would have to ask my trial lawyers for a final ruling. They had endured many extended discussions with opposing counsel regarding relevance and were the final arbitrators for our relevancy decisions.
Sometimes I would give the reviewer drones a long explanation on relevance, but it usually was not required. At first I received a lot of relevancy questions from the Borg eight, but after a while that died down to only a couple of questions a day. I figured that would happen and was not impressed. I was glad that I had a second secret review going, secret at least from the first team, the Borg team. I expected that it would become very public soon when the Borg approach crash-landed in this big case.
I would not share my decisions or the documents found by the Borg with my shadow team, but made sure my decisions were consistent between the two teams. I was doing double work, but the extra security made it worth it to me. I would never rely on that Borg team alone.
Checking In With My Team
It was 5:00 PM EST, time for the daily video conference with my personal review team. This small band of three had taken to calling themselves the Federation. I was in Florida, and my lead reviewer in NYC, the same time zone. But our two other team members were in Northern California and Hawaii. The three of them worked out a review schedule where each worked in a 16 hour sequence, six hours a piece, with one hour of overlap for continuity. That way we could keep a near continuous operation going with just eight hours off for rest and quality control. Their rest was my work to look for anomalies. I did the same quality control for both teams as far as the software allowed.
This linear like schedule made it easier for us to coordinate the machine learning sessions – the iterations – and run linear quality control on consistencies. We had a lot of other secret sauce going to maximize efficiency and quality.
The three Federation reviewers had worked at total of 90 hours that week, 30 hours per reviewer. During that time they completed only five iterations, not sixteen like the Borg. Yet they had already found more relevant documents than the Borg group by a factor of two. Just over 4,000 relevant documents had been found with over thirty different document types. Keyword and concept searches proved to be big contributors. The first seed set was well populated with a diverse group of relevant documents.
In days one and two of the project all my reviewers did was multimodal search to prepare a large seed set to start the training. They found over 1,500 relevant documents for the seed set. We did not begin to use predictive coding until day three. The Federation did two large iterations the third day. They did three more machine training sessions over the next two days.
It would take the host-computer a few minutes to do the calculations, sometimes as long as thirty or forty-five minutes. My reviewers would plan their breaks around that. My Hawaii reviewer claims she would often review on the beach and go surfing on the breaks. I did not care how or where they did their job, just that they did it well. I had no worries about that with this group. They used the software chat feature regularly and kept their coding pretty consistent. They had worked together many times before and did not hesitate to ask me for rulings. We all knew that the hive mind for relevancy was a good thing, so long as it did not involve total assimilation.
I could see from the visual graphs in the software reports that the Federation’s precision rate was increasing. In other words, the rate of relevancy identification was already improving in each round. That compared with the Borg approach where after 16 rounds the precision was still going up and down, with extremes and no patterns. At least not yet. In some rounds they would find many relevant, and in others only a couple. There seemed to be no rhyme nor reason to it. The Borg showed no improvement so far, but I knew its AI would eventually catch on. That soon they would start to improve precision like we were already doing.
The Federation’s accelerating precision was beginning to slow down their review. The fifth day was the slowest of all. As the machine grew smarter and served up a higher percentage of predicted relevant documents to code, the review rate slowed down. That is because it takes longer to identify and code a relevant document than an obvious irrelevant document. The percent of obvious irrelevant was growing less with each round.
All told my shadow Federation team had managed to find 4,000 relevant documents by a manual review of 10,000 documents. That was a precision rate of 40%. This was terrific, especially as compared to the Borg’s precision of 12.5%. Over the five days the three reviewers put in a total of 90 hours of work (30 hours a piece). That put their review speed at 111 files per hour, which was still over twice that of the Borg rate.
Most of the time when drilling through irrelevant documents, especially at first, my reviewers attained speed bursts of up to 500 files per hour. They were, after all, the best in the business. Each of them liked to review and were all star graduates of e-discovery team training. But the overall rate averaged down to only 111.111 files per hour. That was because after the initial seed set, when the software started serving up documents for review to improve its training, they would break from mere linear review of those computer selected documents, the Borg way, and do their own side searches from time to time. They were allowed to think, no, encouraged. I trusted them, their judgment and search skills.
They were doing multimodal review as they deemed necessary to find more relevant documents for the machine training. The occasional break from review to search slowed them down, but allowed them to find and retrieve many more documents than if we relied on the computer alone.
This is what I called hybrid multimodal review. It was multimodal in that it used many search techniques, not just predictive coding, and it was hybrid in that it used human intelligence from highly skilled attorneys not only to code, but also to guide the search. It did not just rely on predictive coding (monomodal) and just rely on computer intelligence (fully automated) to do everything but make relevancy determinations. That was the Borg way, a method that my team and I felt rather stupidly minimized the skills and intelligence of the human reviewers.
The key statistic for comparative analysis is that so far we were able to find 44.44 relevant documents per hour (4,000 relevant documents found divided by 90 hours of review work), whereas the Borg approach had only found only 6.25 relevant documents per hour (2,000 relevant documents found divided by 320 hours of review work.) So far the Borg approach was only 14% as efficient as our multimodal (6.25/44.44).
The Borg approach had found 2,000 relevant in reviewing 16,000. The 2,000 relevant found were in twelve different document types. That represented a precision rate of 12.5%, a rate they bragged about as 30 times better than chance (0.42%). They did not know, as I did, that the multimodal approach had a precision rate of 40%. This was over 95 times better than a random chance precision and 3.2 times better that the Borg (40/12.5).
Aside from the obvious inefficiency, the lower number of document types is what really concerned me. I was convinced that, despite their claims to the contrary, the irrational blind-chance Borg approach would miss too many outliers, maybe even miss obvious documents. I expected the Borg methods to increase the number of relevant documents found as the iterations grinded on. But I didn’t think they would expand and change the document types after a certain amount of training. Random chance can only take you so far. I knew human intelligence would win in the end.
To be continued …