This is a continuation of Journey into the Borg Hive: Parts One, Two, Three, Four, Five, and Six of a SciFi Saga. This series is about a legal search project set sometime in the not too distant future. For a look at fifty years from now check out A Day in the Life of a Discovery Lawyer in the Year 2062: a Science Fiction Tribute to Ray Bradbury.
Day Nine: Deeper Into Dark Matter
We were beginning day nine of the search. The Borg under Siri’s leadership completed their 24th round of machine training last night. In the past three days they had reviewed 8,000 documents. Each of their rounds was always 1,000 documents, some random selected, some selected by the growing artificial intelligence of the software. It was not all random like some Borg type software. Just the initial seed set. In the eight days of review the Borg drones had coded a total of 24,000 documents.
The Borg always worked 8 hours of review time per day. With 8 reviewers that meant 64 hours per day of review. Over the last three days, with 192 man hours, they had reviewed 8,000 documents in 8 iterations. That meant their average review speed was 42 documents per hour. This was significantly slower than their average rate of 50 files per hour in the first five days of the review. The reviewers were slowing down their pace because the Borg software precision was starting to go up, meaning the reviewers were presented with more relevant documents to review.
After 24 rounds of machine training they had now identified 6,000 relevant documents out of the 24,000 reviewed. They had coded 2,000 relevant in the first five days, and another 4,000 in the past three days. Their speed of relevant identification was starting to pick up as I had expected. The number of different category types was also increasing rapidly. They were now up to 25 relevant document types. (They had only found 12 at the end of the first five days.) I was getting more and more depressed as I read the latest daily reports. Still, the Borg had found no smoking guns and only a handful of documents from their 6,000 relevant were even moderately interesting to the trial team.
My morning report from the Federation team showed they had now reviewed a total of 15,000 documents and found 10,000 relevant. That was an excellent 66.66% precision rate. This compared to the 24,000 reviewed by the Borg and 6,000 relevant, a 25% precision rate. In the last three days the Federation team had reviewed 5,000 documents and done 7 more rounds of machine training. (In the first five days they had only done 5 rounds, but the first two days were on the seed set, and they often had shorter rounds as the computer became better trained.) That made a total of 12 rounds of machine training as compared to the Borg’s 24.
The Federation file per hour review rate had slowed down as their precision increased, just like the Borg team. In the last three days it was only 93 files per hour (18 hrs per day, times 3 days, equals 54 hours; divided into 5,000). In their first five days they had averaged 111 files per hour.
What was really depressing was that the federation team had still not found any highly relevant documents. They had found more interesting relevant documents than the Borg, and were now up to 50 different types of relevant documents, but again, nothing earth-shattering.
Still, there was some hope because the Federation team continued to find new types of relevant documents not seen before. When it is all becomes just more of the same, and new types don’t appear, then you know you are probably near to the end of your search. So there was still hope we would come across a document that was not only new and different, but also powerful.
My three-person Federation team had been working their usual 6 hour review days for a total of 54 hours over the past 3 days. They were feeling pretty good about their work because the total number of relevant documents they had found was now up to 10,000. They knew they were now on the inside of the bell curve of probable relevant documents. They had moved to the right of the 8,159 document set point, but they still had a long way to go to the spot projection of 17,133 relevant documents. That had the highest probability of occurring.
They had hoped to find more relevant than that by now, but then again, they also knew the relevancy identification was probably going to accelerate quickly before it gave out entirely. By that I mean a point where the computer stops finding any new document types and slows way down in relevancy. My reviewers were now thinking that this corpus was not dense at all, that the yield was probably to the left of the spot projection at the top of the curve. There were beginning to predict it would likely come in at less than 15,000. We would always guess like that near the end of a project. Sometimes bets were made. Not in this review. There was too much at stake to be distracted with project metrics gambling.
The 10,000 found by the Federation was a pretty good result for eight days. It compared very well with the Borg’s results of 6,000 relevant documents found, but the differences between the teams was decreasing. At the end of the first five days the Federation had found 4,000 relevant documents, compared to the Borg’s 2,000. So the Borg were catching up.
Our efficiency measure of the number of relevant documents coded per man hour of review was still much higher than theirs. We were finding 69 relevant files per hour (18 hours of review per day, times 8 days = 144 hours; divided into the 10,000 documents found, equals 69.444). The Borg rate was only 12 relevant files per hour (64 hours of review per day, times 8 days = 512 hours; divided into the 6,000 relevant documents found, equals 11.719.). That meant we were 5.75 times more productive than the Borg.
But I was not smiling. We were just proving Borg inefficiency in low prevalence datasets, not defectiveness. We were winning the battle, but still losing the war. I got a dark feeling that day nine was going to be bad.
To be continued …