Law Review Article Published on the Mathematics Underlying e-Discovery: “HASH: The New Bates Stamp”

Corned Beef Hash picture shown with the actual hash values for this fileAs many people know, especially my friends in the e-discovery world, I am a real hash enthusiast. No not the corned beef variety shown here as a joke, and certainly not the illicit drug; I am talking about the mathematical wonder underlying e-discovery, the hash algorithm. In September 2006, I had a brainstorm on how to use the hash algorithm to replace the old fashioned Bates stamp as a document organization and identification protocol for electronic documents. The hash stamp can not only identify all computer documents like the 100-year-old Bates stamp does for paper documents, it can also authenticate them and reveal if there have been any alterations from the original. This would serve to protect the legal profession from the ever-present danger of fraudulent manipulation of the ephemeral bits and bytes that now make up electronic evidence.

The authentication properties of hash have long been known and used in e-discovery, but there was a serious problem with also using hash as a naming protocol: hash values are way too long. The two most common kinds of hash are called MD5 and SHA-1.  An MD5 hash is 32 alphanumeric values, and the SHA-1 has 40 places. Here is an example of the shorter MD-5 hash:

5F0266C4C326B9A1EF9E39CB78C352DC

That is too long a number for humans to use to identify an electronic document. For that reason, hash was deemed impractical for use as a document naming protocol, even though it had tremendous advantages in authenticity control.

That is where I got the “big idea” last September to truncate the hash values and just use the first and last three places. Under that system the above hash becomes the much more manageable: 

5F0.2DC  

As explained further in the Article, the six place identification alone avoids collisions 98.6% of the time. In the rare event they match, the full hash values can be consulted. Credit goes to computer expert, Bill Speros, an attorney consulting in litigation technology and data management, for doing the statistical study to confirm my intuition.

I got excited about the idea and wanted to promote it. I decided the best way to do this was to take my time, thoroughly research hash and related subjects, and then write a law review article on the subject.  The article would not only advocate this naming idea, but also fully explain what hash was all about, and how hashing was far superior to Bates stamping for the identification and organization of large volumes of ESI. 

Once I got into this project, it took on a life of its own.  Over the next nine months, I ended up reading every legal case and article that in any way pertained to hash, and hundreds more outside of the law on how hash was being used by scientists, spies, and mathematicians in a variety of technologies.  Before I knew it, I was up to a 44-page article with 174 footnotes.  Countless evenings and weekends were lost in the process. My golf handicap soared. But this near magical encryption formula called hash can be very addicting!

I am relieved to say that the writing part of the project concluded last month with the publication of HASH: The New Bates Stamp, 12 Journal of Technology Law & Policy 1 (June 2007).  The article advocates the truncated hash naming protocol, sets forth all of the case law in this area, and explains what hash is all about without actually going into the mathematics.  I readily admit that is beyond me.  

Although the Journal of Technology citation refers to this publication as the mid-year June issue, in fact it was not printed until mid-August.  The law students at the University of Florida School of Law running the journal did a good job of cite checking, proofing and otherwise making me look good.  So too did my son, Adam Losey, who is enrolled as a law student there and helped me with research. Other articles in the Journal on patents and copyrights are interesting too.  You can order a copy of the full Journal for $15.00 by an email to mail@wshein.com or by snail mail to the Journal of Technology Law & Policy at 218 Bruton Geer, Gainesville, FL 32611. Help the students out and buy a full copy of the journal. 

But if you just want to see the article right now, without the full law review formatting, you can download the article only – HASH: The New Bates Stamp, which I have uploaded here as a PDF file. You may copy and forward this article to others, so long as you do so without charge, and do not alter the contents.

I had never written a law review article before. So I was quite surprised when, after publication, a law professor explained to me that my task was not complete. After an article is published, it is standard procedure to then send copies of the article to as many people as possible that might be interested in the subject, and especially to judges who might cite to it.  Based on this advice, I sent out a couple of hundred copies of the article to experts in the field of e-discovery, including practitioners like myself, academicians, judges and e-discovery vendors.

I did so in the hope that they would not only use the article as a reference for all things hash, but also to promote the new naming protocol. I strongly believe that the use of hash is imperative to the future integrity of the legal system as we move from paper to electronic bits and bytes.  Digital evidence is so easy to change,  both intentionally and by mistake, that the legal profession needs hash to protect the authenticity of electronic evidence. The simple Bates stamp is not up to the task. 

Many who received the article expressed thanks, and many vendors especially have already expressed interest in using the protocol. The first response I received, within just two days of mailing, was from Judge John L. Carroll, Dean and Professor of Law, Cumberland School of Law, Samford University. Although I do not know Judge Carroll, I had of course heard of him, and heard him speak once on e-discovery. To my knowledge, he is now the preeminent academic authority on e-discovery.  I was surprised that he responded so quickly, and even more pleased to read what he had to say.  First  he thanked me for the article and then said:

I teach a seminar in e-discovery and evidence law and I am making it required reading for my students. It is really thoughtful and very well done.

I was proud to receive that, and several other positive responses. But I did learn, secondhand, that at least one recipient of the Hash article suspected commercial motives on my part, and criticized the mailing for that reason. I doubt they actually read the article; they just saw a letter from a lawyer sending an article and assumed it was another pesky “white paper” pointing to some software or consulting services.  But the truth is, I sell no hash software and offer no hash or hashing services. Also, so far as I know, no lawyer has ever been hired because of his or her knowledge of a mathematical formula, much less an encryption formula like hash.  I just think that “truncated hash marking” is a good idea whose time has come. I really believe that lawyers need to use hash to protect the integrity of electronic evidence, and thus protect the whole system of justice in the electronic age. A bit grandiose perhaps, but that is the motivation, not the delusional hope of some commercial gain.

But, this is America, and a few people apparently suspect there must be a hidden profit agenda here somewhere (not that there is anything wrong with that), so let me set the record straight.  This is strictly an open source, freeware proposal. Although I appear to be the first to think of the hash truncation idea, I do not want to delay or hinder anyone else from using the idea. Just the contrary; simply put, anyone can use the idea and proposal anywhere and any time they want, and they owe me nothing. I have nothing to sell, and don’t plan to.

As I explained in footnote 161 of the Hash article, a few patent attorneys I talked to while researching the proposal suggested that my idea might be patentable, but I had no interest in going that route. Commercial applications and ownership claims would only interfere with and slow down the application of this idea and delay the implementation of the proposal. My interest then and now is to freely disseminate and dedicate this idea into the public domain.

There may well someday be commercial exploitations of the idea by others, which is well and good.  If so, this will be by e-discovery vendors and other IT document management experts with far greater technical expertise and experience in large volume document management than I.

Anyway, this hash article has been a labor of love over the past year.  I urge you to please download and read a copy, and share it with any colleagues who might be interested.

15 Responses to Law Review Article Published on the Mathematics Underlying e-Discovery: “HASH: The New Bates Stamp”

  1. Rob Robinosn says:

    Ralph – excellent post and excellent presentation of the idea of truncated hashing. Thanks for the openness in sharing such a salient idea.

    Regards/Rob Robinson

  2. […] as a means of identifying the document. His e-Discovery Team blog entry announcing the article is here, while the article itself can be read here. This entry is filed under Miscellaneous. You can […]

  3. Heidi Maher says:

    I cannot imagine how you could’ve received any criticism for sending out the law review article. I found it very enlightening and useful. Everyone should review and evaluate for incorporation into an e-discovery protocol.

  4. […] copy, originally published in the June 2007 edition of the Journal of Technology Law & Policy, here. Mr. Losey explains what hash values are, how they are used in litigation and proposes that a file […]

  5. […] post by Ralph Losey on how to abbreviate the hash code in order to create a relevant bates number fits nicely into this discussion, although implementing […]

  6. Greg says:

    First off, great job on the article Ralph. I do have a couple questions / points in response to your paper. You talk about hashing files to prove that they haven’t changed…for certain types of electronic archives- take Microsoft Outlook PSTs as an example – a hash of the PST file (really a container) is useful in authenticating the PST itself, but not later on when you choose to produce a single email with attachments.

    Now, you could say that you should then hash the individual email and attachments…but the challenge here is what should you hash? You could save the email as an MSG file and hash it…but unfortunately when you save an MSG file the bits and bytes are different each time – so the same document will have a different hash. You could save a representation of the file such as RTF or HTML, but different vendors or different software versions could result in different outputs.

    This can also be challenging if you must produce reviewed documents back in a PST format. Creating a PST that is a subset will obviously alter its hash and then you also won’t have a great way of referring to individual messages. Any thoughts on how your ideas can be applied to archives, specifically email archive formats?

    Lastly….for files that contain references to other files – say HTML and images – changing any file names will break referential integrity. This could also be the case for email archives where changing the file name of an attachment would require changing the content of the parent to keep the relationship intact. Thoughts?

  7. Ralph Losey says:

    Greg, that is an excellent issue you raise re the individual emails in a PST file, part of the larger problem of unpacking and identifying files in an archive. I suppose a set of standards will need to be developed and followed by all vendors for uniformity of hash values; i.e. – how to go about saving an email from the pst to msg so that there is as little alteration as possible, and the same hash is reproduced.

    I do not have a particular suggestion at this time on a standard for this. Does anyone else have any thoughts on this? Suggestions? Over time I am confident that people with greater technical expertise and experience than I have will be able to figure out good solutions to these and other problems.

  8. steve devlin says:

    Great article! I run the lit support doc management for a government law office. We often get responsive productions 50,000+ pages docs on a DVD (tif & txt & Summation load files). I’d like to file save the hash values for the disks, folders, and their contents (disk/folder level tells me I have copied it completely; file level gives me individual doc integrity). Implementing this is another story. I cannot find hashing programs that do folders or entire disks. Karen’s Directory Printer does a file listing with MD5 and/or SHA. Its format is the best I’ve seen for bulk, but is too slow for “big” projects. I’m looking for a “conmmercial grade” tool capable of handling DVDs and HDDs. Any suggestions?

  9. […] reference to hash, is puzzling. Hash coding is a standard procedure for all competent e-discovery vendors and this […]

  10. Hi Ralph, first thank you for your time and effort to author content in the field of hashing as applied to the legal field and for providing a forum for discussion.

    Hashing is indeed fundamental to the ability to determine and demonstrate what I like to call “intrinsic authenticity.” This is distinct from “inferred” approaches which presume authenticity based on the presence of effective external controls and trusted insiders. However, it is clear from recent events that insiders are not always trustworthy nor are controls always effective in keeping malicious individuals out. In fact, over 70% of security breaches occur as a result of insiders.

    My comment relates to the limitations of hashing alone. A hash value can certainly be generated and placed into the metadata (or associated to the original data) but the same ease to which a hash can be generated and inserted in the metadata it can be regenerated at any time and reinserted into the metadata. If this cannot be prevented or detected, then the incredible power of hashing can actually create a situation where falsified information that contains a hash is perceived to be irrefutably authentic.

    A hash is very useful in determining whether two records are identical or whether a record is unchanged. In fact, de-duplication is premised on the first use case where the only question is – does a record produce the same hash as another record? If yes, they are identical. However, the second use case (is the record unchanged) requires more than just a test of integrity (i.e., hash comparison). Is the record unchanged begs the question – unchanged from when?

    Although a hash is a unique digital fingerprint of a set of data, alone it is “floating” or unanchored to a reference which is beyond the control of all, insiders and outsiders – time. That is, if one can circumvent the controls around the record, such as would be the case in criminal or organized crime hacking, or if one is in control of the controls around the data, such as would be the case for a System Administrator, or if an executive can enter into collusion with or coerce such an individual, one can easily go into the content or records management system, falsify the file, create a new fraudulent hash of the falsified file and then insert the fraudulent hash in the metadata of the falsified file. When tested, a fresh hash of the falsified file will match (compare positively) the fraudulent hash in the metadata of the falsified file. This creates a false sense of integrity that will be very difficult to refute given the current understanding that if you have a hash it is presume “… to ensure integrity.”

    Do not get me wrong, hashing is fundamental and a key step in the right direction, but it is “necessary but insufficient.” Once the hash is generated then the issues shifts to the “chain of custody” around the hash and when it was generated. Until the hash and file content is cryptographically bound to time there is a realistic possibility that the above can and will occur given that the necessary skill level is not very high.

    “Outside in” or perimeter-based approaches to preserving the chain-of-custody that depend on the system or trustworthy individuals are more complex, more costly and lower assurance than persistent data-level intrinsic approaches derived from cryptographic hash binding. The technology to cryptographically bind a hash to time is well established and embodied in the American National Standards Institute (ANSI) X9.95-2005.

    Thoughts?

  11. Ralph Losey says:

    Thank you for your comment, which, as you know, follows a lengthy discussion with your colleague, Paul Doyle. Paul is the owner of your company, Proof Space, and we had a lively phone discussion on this already.

    We agree on much, including the importance of hash, and the need to verify the authenticity of ESI, especially in an e-discovery context. I also understand some of the important benefits and additional security offered by your company’s time-stamp digital signature based hash software products. You may recall I mentioned this type of application of hash in my law review article. Adding an encryption based signature that proves the time of the creation of a file certainly has important benefits in some circumstances, especially where proof of time might be important, as with patent documents, or there was a viable risk of attack and alteration of documents after creation. It looks like a very good system to insure that a document has not been modified since it was created.

    But, as you know, we also disagree on the need for the routine employment of your software, or others like it, as part of the standard e-discovery process, or as part of a standard enterprise document management system. To me it seems like overkill to use it all of the time.

    If ESI is hashed at the time of original e-discovery collection, and proper forensic chain of custody procedures are then followed, then that proves the identity of the ESI at a later time. It proves it is a bona fide copy of the original file or document that was in the producing party’s computer system at the time of collection. That is sufficient to prove authenticity almost all of the time, and sufficient for the admission of ESI as evidence, barring other circumstances, including the circumstances you mention of fraud where the genuineness of the file is challenged.

    Granted there are circumstances where proof of identity of ESI at the time of collection may not be sufficient. There may be evidence that a file was forged before collection, in which case the Hash at collection would only prove identity, that this is the same file that existed in the producing party’s computers when the collection was made. It would not prove that it has not been altered between the time of original creation and time of collection. But still, the genuineness of a document, including an electronic document, is presumed by its location in business records and other evidence of routine business practices, unless there is some evidence of fabrication or alteration. Evidence of fraud could include testimony of the original creator of the file, who might say this was changed after I made it, or testimony for a third party who might say this is not the same file sent to me at that time.

    In other words, the hash value alone would not prove that the file had not been altered BEFORE collection. If challenged, you would have to provide testimony to support its presumed genuineness. Only your specialized hash/time software could prove it was the same file that was originally created, and even then, only if your software was used at the time of creation. Put another way, Hash without document origination based time stamp can only prove that the ESI was a genuine copy of the original ESI in the company’s computers at the time of collection. I agree with that point, my disagreement is only with the evaluation of the significance of this limitation, both practical and legal.

    The same issue you mention also arises in the collection of paper documents. It is always possible that a business could have a forged document in its records. Still, the law presumes that a record found in a business is bonafide and authenticate, so long as you can prove it is an exact copy of the original. It is not, however, a conclusive presumption, but the burden lies on the objecting party to provide some evidence that the document is not genuine, that it is a fraud. This in turn relies upon comparison with other documents, and witness testimony, especially the testimony of the person who created the file to begin with, and in some cases, forensic examination of the computer systems involved.

    True, if a company had used your software to time stamp all of its ESI, then falsification would be much more difficult, perhaps even impossible (but I never under estimate the creativity of criminals, and falsficiation of evidence is a crime). Still, even if your software was involved, a court would probably also want to hear the testimony of the person or persons who created the ESI, and see comparisons, where available, with any other documents that claim to be the original. I understand from Judge Grimm that it is theoretically possible for two copies of the same document to be considered authentic, and then offered to the jury to decide which one was genuine. Again, interesting theory, but even Judge Grimm could not think of any case where this had ever happened. I know it has never happened to me in my 28 years of litigation.

    In sum, I remain unconvinced that to prevent even the possibility of fraud, a company should time stamp and hash each document that it creates. This is not practical. Large companies create millions, if not billions of ESI files every day. The routine employment of software such as that offered by your company is, in my view, unnecessary over-kill as a general practice for all ESI. It adds a layer of time, expense and burden not required for 99% of most company’s ESI.

    Still, sometimes a business might want to so secure a file by use of your software, especially, as mentioned, for patent, or other time dependent or sensitive records. So, don’t get me wrong, I think you offer a valuable piece of software, one that ingeniously incorporates and builds upon the power of Hash. I just think you say too much to suggest it should always be employed or you risk making your electronic evidence inadmissible.

    We also seem to agree that in e-discovery ESI should always be hashed at the time of collection. This establishes the key time for bonaficity of a copy. But you also seem to suggest that an electronic file, and its accompanying hash, could be changed after collection, and because of this possibility, your software should be used then too. Again, I disagree, but only because I do not consider this to be a realistic possibility if proper chain of custody is maintained. For the fraud you hypothesize to occur, there would have to be a break in chain of custody, and a criminal event during that break. Again, this is a possible, but it is, in my opinion, a very far fetched situation.

    Once a collection has been made, the first copy is secured, and multiple copies are then typically distributed to various interested parties. It is far fetched to think that someone could then surreptitiously gain access to all copies of the hashed collection set, and modify certain files and their hash. This would require knowledge of the location of all such data vaults, and then inside help to break into them. A criminal who attempts to fabricate evidence in this manner would have to bribe or defraud, at the very least, the e-discovery vendors, and the parties attorneys. If attempted, it would almost certainly be detected.

    Again, for me the slight risk of such criminal activity does not justify the use of encrypted time stamping at the time of collection as part of a new industry standard. Hashing at the time of collection, plus secure chain of custody, is sufficient, and has been recognized as legally sufficient in many cases by courts all around the country, as my article shows.

    Does anyone else have a view on this issue?

  12. Hello Ralph: I am the founder of TimeCertain, one of the companies cited in your article. I am also a practicing attorney, litigate digital evidence matters, am co-Vice Chair of the ABA Information Security Committee, and am also the co-author of an upcoming American Bar Association book entitled “Foundations of Digital Evidence” (exp. pub. Spring 2008), and am. One of the chapters for which I have chief responsibility deals with time and digital evidence. Together with Hoyt Kesterson, II (one of the creators of the original X.500/509 standard), I also presented on issues relating to the SHA-1 forced-collision research that was published by a Chinese researcher in 2006 hashing functions at the RSA 2006 Security Conference.

    Hash functions are useful, but they are not the panacea for all authentication ills. The utility of hash functions (and output) should not be overstated to claim what hashes do *not* provide. Further, within the context of authenticating digital evidence, it is important to keep in mind *what* it is that one is authenticating.

    The technical definition of a hash function is the manipulation of a variable-length input data string to produce a fixed-length output data string (typically shorter); that resulting output data string having two characteristics – (1) it is computationally infeasible to find two input strings that will produce the same output string, and (2) given an input string and the resultant output string, it is computationally infeasible to find another input string that will produce the same output string.

    While this data “fingerprint” will (absent a forced-collision vulnerability) may be considered free from unintentional manipulation, it is *not* without more, free from intentional manipulation by one in control of computer environmental variables. One of these variables is “time.” If one can change the time “known” by a computer generating relevant data, one can also change both time of data as well as data content at whim. One could, for example, reset a computer clock to an earlier date and create a document that appeared to third parties to be authentic.

    Where this becomes problematic focuses on the claims by forensic imaging vendors who claim that the performance of a hashing function on a drive image proves the authenticity of the underlying data. It does not, and a hash of a drive image that contains a back or forward dated document will not prove the *true* date of that data blob’s creation, or first instantiation. A hash function only, therefore, will not detect intentional manipulation by the data generator.

    As for the drive image itself, the hash function will only prove that the drive image could not have been changed by anyone, again with the exception of the person who created the drive image and ran the hash function on it. The hashing function alone may help to narrow challenges as to who might have altered data, but it does not prove that the data (whether in the form of a drive image, file, or other bit stream) *has not been altered*

    The addition of a digital signature to the hash of a drive image adds additional protection to digital evidence, but again, it does not prove that the data comprising the drive image was not altered or otherwise manipulated prior to the conduct of the hash-and-sign function. What it will tend to prove is that the drive image itself could not have been changed by anyone except the person or persons who hold the either the private key (in a PKI system) or the encryption key used to sign (or encrypt) that hash.

    Again, however, the same argument and challenge may be made. Those in control of environmental variables (time, encryption key, etc.) could backdate data, re-run a hash and sign process on a data blob (including a drive image) and then offer it up as authentic, with little or no way to prevent authentication by traditional FRE 901 methods. Two articles co-authored with Jeff Stapleton (the chair of the ANSI X9.95 standard described below) entitled “Digital Signatures are Not Enough” and “The Digital Signature Paradox” in 2005 and 2006 by the IETF Workshop and the ISSA Journal discuss this in more detail.

    At best, what can be argued is that the binary data representing the drive image or files contained therein could not have been changed by anyone except those in control of the environmental variables.

    Again, even if one hashes and digitally signs a drive image, it narrows the potential actors but does not eliminate the possibility of intentional manipulation. It also does not prove any digital data file authenticity, and may result in the presumption of authenticity for manipulated data.

    Time adds another layer of protection for what I call “provably persistent data integrity” — that is, proving that digital data (evidence) is what it purports to be, *at the time that relevance attaches to it, and that it can be demonstrated that such data could not, and therefore was not altered by anyone since that time.” (Quotes are mine). Not at the time a drive was imaged, but at the time the relevance of the data attached to it, which means when it was created, transmitted received, accessed, modified, etc. For paper, this is generally either presumed, or adequate forensic tests exist to ascertain this. For digital data, which consists of ordered sets of zeroes and ones, control over time by a data generator robs digital data of much of its authentication capability. I am currently litigating a spoliation motion in Federal Court where three versions of an digital information varying in time and content, have been offered as “identical” and “original” The entity here has control over the environmental control variable of time, and so this comes as no surprise.

    One way (and there are others) of generating digital evidence with provably persistent data integrity is to bind a trusted time value (typically from NIST) to digital data (preferably hashed and signed) in a cryptographically robust fashion. That is what trusted time stamping does, and details of the various methodologies (which are protocol-compliant) are set out in more detail in the ANSI X9.95 Trusted Time Stamping Standard published in 2005. Generally, trusted time stamping creates both a token (fingerprint) to digital data at the time of first instantiation, such that if the data blob or the token if thereafter altered, it is immediately detectable.

    Of course, even a trusted time stamping schema properly deployed will be subject to the typical 901 authentication requirements, but once met (and they can be met either through 901b4 or b9) the authenticity of the data content qua content will be extremely difficult to challenge.

    So, my long way of saying that hashing is not enough for provably persistent data integrity. Digital signatures and hashing together are still insufficient. Adding trusted time stamping can be enough, if deployed properly.

    As for your contention that not everything needs to be time-stamped, I agree, with a qualification. That qualification is that one only need time stamp data which will or may be used as evidence in litigation some other adjudicative proceeding.

    Best,
    Steven W. Teppler

  13. Hi Ralph, for ease of discussion my comments are referenced to your entry dated December 4, 2007 at 3:55 pm by paragraph number.

    Paragraph 2: I am not suggesting that all ESI created be time stamped. There are more significant record events than time of creation, such as time of contract execution, time of filing or time of corporate record declaration. These key business events have important subsequent time-based events; such as, time of destruction based on a prescribed retention period.

    Paragraph 3: I guess by “overkill” you mean high level of assurance. It does provide a much higher level of assurance but at a lower cost. This combined value is compelling as an alternative approach. Timestamping can be very useful in the e-Discovery process by providing a record-level method of irrefutably proving that the authenticity of each record identified as relevant has been preserved. It does so irrespective of where the record has been distributed, where it is stored or under whose control it has been. Therefore, it does not depend on a chain of custody involving parties one has very little control over. This is important when there are a number of individuals involved in the e-Discovery review and analysis process, including opposing counsel.

    Paragraph 4: The use of a hash as a method of identifying a record is not what we are talking about here. A hash can be used to identify a record and demonstrate that an identical copy was made at a point in time. However, it does not prove the authenticity of the record. For this, one must reach back to demonstrate the record is what it purports to be at the time the “assertion” in question is being made. Not at the time it was identified as relevant in the e-Discovery process.

    Paragraph 5: How can an assertion that a record is authentic (it is what it purports to be) when it was just tagged with a hash at time of collection? This is making an over assertion that is false. The only assertion that can be made is exactly what you say: “this is the same file that existed in the producing party’s computers when the collection was made.” – period!

    Paragraph 6: In general the earlier a record is timestamped, the earliest being time of creation, the more confident the assertion is related to its authenticity. It is more valuable to associate the period of “persistent provable authenticity” from the time of a business significant event such as contract execution or corporate record declaration. Again what is important in terms of authenticity is – from the time the assertion is made. I will make the point again that if a person has control of the controls around the data and there is a motive to modify the record, they could do so without detection even with a presence of a hash. From this time forward proving identical copies captured during the e-Discovery process will mean little to those affected by the modifications.

    The “significance of this limitation” is that the assertion of authenticity is overstated and it does not address the core requirement of the Federal Rules of Evidence article 9 which states – “The requirement of authentication or identification as a condition precedent to admissibility is satisfied by evidence sufficient to support a finding that the matter in question is what its proponent claims.” The best claim that can be made in your case is that the record did not change during the e-Discovery process. This adds little to demonstrating that the record is what it purports to be at the time the assertion was made.

    I disagree with “burden lies on the objecting party to provide some evidence that the document is not genuine” – the burden to demonstrate the authenticity of the ESI being proffered is on the proponent of the information. I would refer to the recent American Express precedent (In Re Vee Vinhnee 336 B.R. 437 (9th Cir. BAP (Cal.) 2005) – “The court declined to admit plaintiff’s computerized business records as inadequately authenticated …”. Vinhnee never challenged the authenticity of their records nor did he even show up in court.

    I would also refer you to the Grimm decision (Lorraine v. Markel American Ins (Co., 241 F.R.D. 534 (D. Md. 2007) which stated “… considering the significant costs associated with discovery … it makes little sense to go to all the bother and expense to get electronic information only to have it excluded from evidence … because the proponent cannot lay a sufficient foundation to get it admitted.”

    Paragraph 9: The greater benefit of timestamping is less about detecting fraud, as it is the exception, but more the ability to prove “good” behavior. It is the 99.999% of people who are good who need to be protected by an effective method to refute claims of inappropriate behavior. I would refer you to what happened to Arthur Anderson. After the rash of fraudulent corporate behavior (e.g., Enron and Options Backdating) there is a strong need to be able to quickly and effectively prove good behavior.
    We seem to have forgotten an important point. One cannot look at e-Discovery without the greater context of an organization’s governance responsibilities as it relates to the reliability of its corporate information. Regulations governing corporations; for example, the Sarbanes-Oxley act specify the requirement to ensure the reliability of financial information systems and the integrity of financial records. This is where you start the process of ensuring the authenticity of records – to comply with your governance requirements. Then, if any of these records find themselves relevant to e-Discovery, one can instantly prove authenticity and compliance.

    Paragraph 10: It is in fact a risk based decision as to which methods an organization decides to adopt to ensure the authenticity of their corporate records. They have two choices, either by “inferred” approaches based on external system-level (perimeter) controls or by an “intrinsic” method based on a data-level control. The approach taken will then pre-determine how the organization can subsequently demonstrate the authenticity of their records in judicial or regulatory proceedings. The cost and complexity associated with the inferred authenticity approach is much higher, easily challenged and consequently the level of assurance of successfully demonstrating authenticity is lower. In other words, the risk of not meeting the burden of proof is higher. Is this risk real? The previously cited American Express precedent excluded their corporate records because they were unable to establish their authenticity to a level satisfactory to the Judge, even after several attempts to do so. They took a risk with “inferred” approaches and they lost.

    Paragraph 11: As previously stated, the need is to demonstrate authenticity of the record well before the e-Discovery process. Your presumption as it relates to e-Discovery is based on an effective “chain of custody” between multiple parties, some friendly and others not. Depending on the external controls and trusted parties is a risk. Some will be willing to take that risk. Putting a record-level control eliminates this risk. It is a risk and cost based decision.

    Paragraph 13: Again, I would respond to say the need for demonstrating authenticity of a record in judicial and regulatory proceeding relates more to when the record “assertion” was made versus when it was tagged as relevant in the e-Discovery process.

    Regards,

    Jacques R. Francoeur
    ProofSpace

  14. […] with 174 footnotes leaves you cold, I suggest you try my Hash Page summary instead, or my earlier blog on Hash. They will give you a pretty good idea of how hash is the mathematical foundation of e-discovery, […]

  15. Great article, thanks for the share. Blog bookmarked 🙂

Leave a Reply

Discover more from e-Discovery Team

Subscribe now to keep reading and get access to the full archive.

Continue reading