Borg Challenge: Part One of my experimental review of 699,082 Enron documents using a semi-automated monomodal methodology

Borg_Losey_beginning stage assimilationThis is the first in a series of reports on a fifty-hour predictive coding experiment using a semi-automated, single-search-method. I usually refer to this method as the Borg approach, after the infamous Star Trek villains. SeeNever Heard of Star Trek’s Borg? This method is used by many predictive coding software systems. It contrasts with the hybrid multimodal search method that I use, along with many others.

Multimodal in the context of predictive coding means the use of all types of searches to find machine training documents, not just one. I call it a hybrid approach because the human search expert remains in control of the machine learning. I recently discovered this follows a well-established practice in information retrieval science called Human–computer information retrieval (HCIR). The multimodal hybrid approach is contra to a monomodal approach where predictive coding only is used, and where the interaction of humans in the process is minimized, or reduced entirely to yes-no decisions.

I think the hybrid multimodal method is superior to the popular alternative Borg methods, but I have no real factual basis for this supposition. To my knowledge no comparisons or other tests have ever been made between the two methodologies. My opinion was based on logic alone, that if one search method was good, many would be better. But as I have previously stated here, my opinion could be wrong. After a lifetime in the law I know that for judgments to be valid they must be based on evidence, not just reason. See eg. There Can Be No Justice Without Truth, And No Truth Without Search. That is also a basic tenet of our scientific age. For these reasons I decided to perform an experiment to test the Borg approach against my own preferred methods. Thus the Borg Challenge experiment was born.

EnronI had already performed a fifty-hour search project using my methods, which I reported here. Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. That set a benchmark for me to test the Borg approach against. My experiment would be to repeat the same search I did before, but using the competing method. I would once again search an Enron dataset of 699,082 emails and attachments for evidence of involuntary employment terminations. I would also use the same software, Kroll Ontrack’s Inview, but this time I would configure and use it according to the Borg method. (One of the strengths of Inview is the flexibility of its features, which allowed for easy adaptation to this alternative method.)

As far as I know, this kind of test comparison of the same large search project, by the same person, searching the same data, but with different methods, has never been done before. I began my project in 2012 during Christmas vacation. I did not finish my final notes and analysis until March 2013. The results of my experiment may surprise you. That is the beauty of the scientific method. But you will have to hang in there with me for the full experiment to see how it turned out. I learned many things in the course of this experiment, including my endurance level at reading Enron chatter.

As in the first review I spent fifty hours to make it a fair comparison. But the whole project took quite a bit longer than that as there is more to this work that just reading Enron emails. I had to keep records of what I did and create a report to share my journey. In my original hybrid multimodal review I wrote a 72-page Narrative describing the search. Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. I know this is a difficult read for all but the most ardent searchers, and so too was the write-up. For these reasons I looked for a way to spice up my reporting of this second effort. I wanted to try to entice my fellow e-discovery professionals into following my little experiment. My solution was to use video reports, instead of a long written narrative. Then I added some twists to the video, twists that you will see for yourself as the Challenge of the Borg experiment unfolds.

Here is the opening act. It begins in my backyard in Florida in late December 2012.

Stay tuned for Borg Challenge: Part Two where I begin the search, as per my usual methods, with a random sample.

7 Responses to Borg Challenge: Part One of my experimental review of 699,082 Enron documents using a semi-automated monomodal methodology

  1. Fantastic – do a file listing of the 699,082 Enron documents and post the DocIDs!

  2. […] In this segment I begin the search project with a random sample. Then I began the first two rounds of machine training with the usual auto coding runs and prediction error corrections (steps four and five in the below diagram). The second video below describes the first two iterative rounds. For these videos to make sense you first need to read and watch Borg Challenge: Part One of my experimental review of 699,082 Enron documents using a semi-automated monomodal metho…. […]

  3. […] through sixteen of the training. For these videos to make sense you first need to read and watch Part One and Part Two of the Borg Challenge. If this still makes no sense, you could try reading […]

  4. […] like a character in a Kafka novel. For these videos to make sense you first need to read and watch Part One, Part Two, and Part Three of the Borg Challenge. Even then, who knows? Kafkaesque videos providing […]

  5. […] identical in every respect, except for methodology. The experiment itself is described in Part One, Part Two, Part Three and Part Four of the Borg Challenge. The results reported in my videos […]

  6. […] the entire narrative of the Borg Challenge that was previously reported in five installments: Part One, Two, Three, Four and […]

  7. […] Borg Challenge: Part One of My Experimental Review of 699,082 Enron Docs  http://bit.ly/17mF4XH(@RalphLosey) […]

Leave a Reply

Discover more from e-Discovery Team

Subscribe now to keep reading and get access to the full archive.

Continue reading