As many people know, especially my friends in the e-discovery world, I am a real hash enthusiast. No not the corned beef variety shown here as a joke, and certainly not the illicit drug; I am talking about the mathematical wonder underlying e-discovery, the hash algorithm. In September 2006, I had a brainstorm on how to use the hash algorithm to replace the old fashioned Bates stamp as a document organization and identification protocol for electronic documents. The hash stamp can not only identify all computer documents like the 100-year-old Bates stamp does for paper documents, it can also authenticate them and reveal if there have been any alterations from the original. This would serve to protect the legal profession from the ever-present danger of fraudulent manipulation of the ephemeral bits and bytes that now make up electronic evidence.
The authentication properties of hash have long been known and used in e-discovery, but there was a serious problem with also using hash as a naming protocol: hash values are way too long. The two most common kinds of hash are called MD5 and SHA-1. An MD5 hash is 32 alphanumeric values, and the SHA-1 has 40 places. Here is an example of the shorter MD-5 hash:
That is too long a number for humans to use to identify an electronic document. For that reason, hash was deemed impractical for use as a document naming protocol, even though it had tremendous advantages in authenticity control.
That is where I got the “big idea” last September to truncate the hash values and just use the first and last three places. Under that system the above hash becomes the much more manageable:
As explained further in the Article, the six place identification alone avoids collisions 98.6% of the time. In the rare event they match, the full hash values can be consulted. Credit goes to computer expert, Bill Speros, an attorney consulting in litigation technology and data management, for doing the statistical study to confirm my intuition.
I got excited about the idea and wanted to promote it. I decided the best way to do this was to take my time, thoroughly research hash and related subjects, and then write a law review article on the subject. The article would not only advocate this naming idea, but also fully explain what hash was all about, and how hashing was far superior to Bates stamping for the identification and organization of large volumes of ESI.
Once I got into this project, it took on a life of its own. Over the next nine months, I ended up reading every legal case and article that in any way pertained to hash, and hundreds more outside of the law on how hash was being used by scientists, spies, and mathematicians in a variety of technologies. Before I knew it, I was up to a 44-page article with 174 footnotes. Countless evenings and weekends were lost in the process. My golf handicap soared. But this near magical encryption formula called hash can be very addicting!
I am relieved to say that the writing part of the project concluded last month with the publication of HASH: The New Bates Stamp, 12 Journal of Technology Law & Policy 1 (June 2007). The article advocates the truncated hash naming protocol, sets forth all of the case law in this area, and explains what hash is all about without actually going into the mathematics. I readily admit that is beyond me.
Although the Journal of Technology citation refers to this publication as the mid-year June issue, in fact it was not printed until mid-August. The law students at the University of Florida School of Law running the journal did a good job of cite checking, proofing and otherwise making me look good. So too did my son, Adam Losey, who is enrolled as a law student there and helped me with research. Other articles in the Journal on patents and copyrights are interesting too. You can order a copy of the full Journal for $15.00 by an email to firstname.lastname@example.org or by snail mail to the Journal of Technology Law & Policy at 218 Bruton Geer, Gainesville, FL 32611. Help the students out and buy a full copy of the journal.
But if you just want to see the article right now, without the full law review formatting, you can download the article only – HASH: The New Bates Stamp, which I have uploaded here as a PDF file. You may copy and forward this article to others, so long as you do so without charge, and do not alter the contents.
I had never written a law review article before. So I was quite surprised when, after publication, a law professor explained to me that my task was not complete. After an article is published, it is standard procedure to then send copies of the article to as many people as possible that might be interested in the subject, and especially to judges who might cite to it. Based on this advice, I sent out a couple of hundred copies of the article to experts in the field of e-discovery, including practitioners like myself, academicians, judges and e-discovery vendors.
I did so in the hope that they would not only use the article as a reference for all things hash, but also to promote the new naming protocol. I strongly believe that the use of hash is imperative to the future integrity of the legal system as we move from paper to electronic bits and bytes. Digital evidence is so easy to change, both intentionally and by mistake, that the legal profession needs hash to protect the authenticity of electronic evidence. The simple Bates stamp is not up to the task.
Many who received the article expressed thanks, and many vendors especially have already expressed interest in using the protocol. The first response I received, within just two days of mailing, was from Judge John L. Carroll, Dean and Professor of Law, Cumberland School of Law, Samford University. Although I do not know Judge Carroll, I had of course heard of him, and heard him speak once on e-discovery. To my knowledge, he is now the preeminent academic authority on e-discovery. I was surprised that he responded so quickly, and even more pleased to read what he had to say. First he thanked me for the article and then said:
I teach a seminar in e-discovery and evidence law and I am making it required reading for my students. It is really thoughtful and very well done.
I was proud to receive that, and several other positive responses. But I did learn, secondhand, that at least one recipient of the Hash article suspected commercial motives on my part, and criticized the mailing for that reason. I doubt they actually read the article; they just saw a letter from a lawyer sending an article and assumed it was another pesky “white paper” pointing to some software or consulting services. But the truth is, I sell no hash software and offer no hash or hashing services. Also, so far as I know, no lawyer has ever been hired because of his or her knowledge of a mathematical formula, much less an encryption formula like hash. I just think that “truncated hash marking” is a good idea whose time has come. I really believe that lawyers need to use hash to protect the integrity of electronic evidence, and thus protect the whole system of justice in the electronic age. A bit grandiose perhaps, but that is the motivation, not the delusional hope of some commercial gain.
But, this is America, and a few people apparently suspect there must be a hidden profit agenda here somewhere (not that there is anything wrong with that), so let me set the record straight. This is strictly an open source, freeware proposal. Although I appear to be the first to think of the hash truncation idea, I do not want to delay or hinder anyone else from using the idea. Just the contrary; simply put, anyone can use the idea and proposal anywhere and any time they want, and they owe me nothing. I have nothing to sell, and don’t plan to.
As I explained in footnote 161 of the Hash article, a few patent attorneys I talked to while researching the proposal suggested that my idea might be patentable, but I had no interest in going that route. Commercial applications and ownership claims would only interfere with and slow down the application of this idea and delay the implementation of the proposal. My interest then and now is to freely disseminate and dedicate this idea into the public domain.
There may well someday be commercial exploitations of the idea by others, which is well and good. If so, this will be by e-discovery vendors and other IT document management experts with far greater technical expertise and experience in large volume document management than I.
Anyway, this hash article has been a labor of love over the past year. I urge you to please download and read a copy, and share it with any colleagues who might be interested.