Byte and Switch, my future-law robots, here star in another video animation, this time on random sampling. They explain how sampling is used in machine-learning-based evidence review. In this first segment of a two-part video taking place sometime in the near-future we watch Switch help Byte to get ready to give expert testimony in a Daubert hearing. The presiding Judge, David J. Waxse, in the future routinely insists on that sort of thing. See: Waxse & Yoakum-Kriz, Experts on Computer Assisted Review: Why Federal Rule of Evidence 702 Should Apply To Their Use, 52 WBJLJ 207, (Spring 2013).
Byte, who is an expert by virtue of his knowledge-base, programming, and search experience, makes the perfect witness. Verified programming establishes that he is incapable of lies or evasion. Not only that, he has total recall of everything that happened in every search project he has been involved with. Still, Switch needs to help Byte to get ready to testify. Byte, like the scientists and programmers who created him, needs to learn how to talk simple enough for non-expert humans to comprehend. This animation shows Byte practicing for his testimony.
In this video Byte (shown right) explains how and why random samples are taken at the start of a project, before the active learning training begins. Byte also explains that random sampling is also used again, in a limited fashion, during the training. (The Borg-type predictive coding software that relies entirely on random chance has in this near-future scenario been discredited and abandoned long ago.) In part-two Byte and Switch will go on to explain final quality assurance sampling at or near the end of a robot-enhanced search project.
As usual, pause to let the streaming video get ahead, especially if your connection is slow, and increase the video screen to full size for best effect.
Special thanks to William Webber, Information Scientist, for his background information and help. William has endured hours of my Switch-like questioning on random sampling in active machine learning search projects. His explanations of sampling have been invaluable, including such esoteric topics as Gaussian and Binomial calculations, Simple Random and Stratified Random sampling (William’s speciality), quality control sampling for testing, as opposed to training, prevalence, concept shift, and recall testing. All credit goes to William for what I get right in this future-scenario of random sampling. Any mistakes in the explanation, or errors in predictions, are entirely my own.
For the earlier adventures of Byte and Switch, see: