I was interviewed this week by Mary Mack, the Corporate Technology Counsel for Fios. People that heard the webinar seemed to like it, so this week’s blog is Part One of my off-hand remarks. Naturally it is slightly edited, visually enhanced, hyperlinked, with secret thoughts to self revealed. Since this project came in at over 9,000 words, I decided to break it up into two parts. There is only so much of my big mouth <and secret thoughts> anyone can tolerate in one sitting. If you want the hear the banter for yourself, you can download the original audio interview as a free Fios webinar. Thanks to Mary for the not-so-difficult grilling and to ever-pleasant Debbie Caldwell at Fios who set it up. No doubt this example of what it is like to talk to me for an hour about e-discovery will discourage anyone else from attempting such a conversation.
I became acquainted with Mary’s work years ago when reading her book, A Process of Illumination: The Practical Guide to Electronic Discovery. It was short, simple, and very practical. All good things. A new edition of her book came out in mid-2008, which is when I finally met Mary at a Fios dinner at LA Legal Tech. I discovered that, unlike the book, Mary was not short and simple, but rather free-wheeling and complex. I also learned that, unlike some in this industry, she genuinely loves computers and, like me, has a long history with them. Like me, she also writes a blog, Sound Evidence. She obviously knows her craft and has as much hands-on experience with e-discovery as anyone in the business. Finally, and here is the real reason I was delighted to accept her invitation to be interviewed, she has a sense of humor with an unpretentious, down-to-earth personality.
THE INTERVIEW
MARY: I want to thank Ralph for joining us today. He’s flying without a net. Very few presenters will come on without a prepared list of questions and answers, so we’re looking forward to a lively exchange and we invite your questions and comments.
RALPH: Well, thank you, and thank you everyone for tuning in to this. <Thought to self: why is she talking about “flying without a net?” Is this a set up? Wonder if she knows about that night in Vegas?) I hope we’ll keep this lively for you all and please send us in some questions. <Oh God, what have I gotten myself into?>
Grilling Hash
MARY: And so, Ralph, I want to turn it over to you for one of your more controversial initiatives around the new Bates stamp.
RALPH: <Phew!> So I think you want to start off with something that I did – it’s been over two years ago now <Stalling for time, trying to recall now what I said about hash> – where after being out of law school for almost 28 years, I decided to write a law review article <boy was that a dumb idea!> — actually got inspired by Craig Ball on this <still stalling> – to write on the subject of <oh yeah, now I remember> the mathematical algorithm Hash MD5 and SHA Hash. So the title of this is Hash – The New Bates Stamp.
If you want to go ahead and show the rest of the slide for that (see above). Most of you on the phone certainly have heard of Hash already and you know that it is not something that goes with corned beef and eggs, but it could. The first line on the slide is an example of an MD5 Hash, which is actually the shortest version compared to SHA Hash. You can see it’s a lot of numbers and letters, alphanumerics.
The brief summary on what Hash is – and I’m not going to go into a lot of detail on it, although I’ll be happy to answer your questions – is Hash is a metadata of a file that you generate through a mathematical formula that allows you to distill a computer file down to a unique alphanumeric value. The Hash serves as the unique fingerprint of this computer file such that – and you can see the numbers here 340 billion billion billion billion <Carl Sagan would be proud> possible values, that’s the odds of a coincidental collision occurring where one computer file has the same Hash as a different computer file. It is on a magnitude of a billion times less likely to happen than there being similar DNA mistakes. <Or was it billion, billion? Hope Craig Ball isn’t listening.> So this is something that is very good for authentication and proving that a computer file is what it is, far better than DNA that is used in criminal cases to identify a person, the Hash can identify the computer file.
MARY: So, Ralph, I think the controversial part is – from what I’ve heard at various conferences – is a reluctance to use that really long big number as a document identifier, imagining that they would be at a deposition saying something to the effect of, “Do you recognize document number 588BCBD1845 etc., etc., etc.”
RALPH: That was the problem. So that’s where I had the idea and, you know, it was floating out there, but I sort of took the initiative that what we need to do is just use the first and last three values. In other words, you truncate the entire Hash value and only use the first and last three places. On the slide you see an example of a long MD5 Hash number. By truncating it and using the first and last, you take the first three alphanumerics, which here all happen to be numbers, 588, you put a period to indicate that’s the end of the first three and then you go to the last three values, here 459. That then becomes the truncated or abbreviated name for the entire Hash (588.459). That would be the name you would use as a label for a shorthand identification of the electronic computer file itself, which might be a document, it might be a video, it might be a piece of music, it might be an entire hard drive with a hundred million documents on it. Whatever it was you were authenticating and you were identifying as a computer unit, it can generate a Hash value.
I have determined – thanks to my friend Bill Speros, who ran the computer test models on this — that even if you just use the first and last three digit integers, you are still unlikely to have a collision of the numbers being the same – approximately – and there’s a footnote in my law review article on this – but if I recall correctly <which I usually don’t, so read the article; it did take me about 1,000 hours of work, and now I’m supposed to summarize it in two minutes!>, it was something like 97% of the time, you’re going to be okay. Three percent of the time, you may come up with a match, in which case you’d then consult the full Hash value itself or in all likelihood the two computer files would, in any event, be totally different. One might be a video and one might be a document, so it’s unlikely to create any confusion by just truncating it to the first and last three places.
MARY: And then if you do as you have in your last line, adding other naming conventions such as perhaps the custodian or the case or the production the way that Bates are normally used, that would even narrow it down even further.
RALPH: Exactly. So that you can know what this computer file is without necessarily opening it, because you won’t have it all memorized. It certainly makes sense to add some sort of name too. My proposal is to use the pound (#) sign as the symbol to indicate you know – that after the pound sign, to the right of it, begins a truncated Hash number. <This is boring, even for me. Must try and spice it up somehow.> The reason that I use the pound sign is that in most of the world, the pound sign on the keyboard # is called a hash mark. In a kind of concession to looking at this globally, we’re using the pound sign to mean hash, like it does in most places. Then you can put to the left of that whatever kind of identification you want. Some people who are just very insecure unless they look at things sequentially, could even put a Bates number to the left of that.
MARY: Are you calling us insecure, Ralph? <Uh-oh. I stepped in it there.> At Fios, we have what we call the Fios electronic numbering system. It’s a FENS number, which is a sequential number that we have available to our clients if they wish to identify the native files so indeed.
RALPH: It’s a habit. <Quick change subject.> I mean, an interesting thing about the Bates stamp – and my article goes into this if you’re a history buff – is that it was invented in the 1890s and Thomas Edison thought it was such a good invention, he purchased the patent to it and took over the Bates stamp company. And so this was very high-tech state-of-the art in the 1890s. <Oh no. I think I just rubbed it in.>
MARY: Okay, Ralph. <Now she’s pissed! Here comes the Vegas questions.> Our legacy file numbering system then – actually it’s quite simple – having a sequential number is a very easy way to number things and there are clients that like to have productions without gaps in the Bates and that allows that to occur as well.
We have a question <Thank God for the listeners.> about assuming somebody using the Hash code for native files in place of the Bates number and assuming that it’s like a self-authentication, how do you suggest that the identity of a particular file be verified on the fly, such as at a deposition when the opposing party puts the file up on the screen.
RALPH: Well, funny you should ask. Actually the concluding page of my article, <please, read it someone!> I do envision how this will be handled in the future in depositions and in fact at trial – and honestly I have not done this yet myself, so it’s still imagination <hate bs’ers>. But it would be a matter of taking a computer file – we’ll say I’m going to ask you to authenticate this computer file. It would then either be beamed to, or transferred on to your computer. You would instantly run a Hash check on that computer file. If it came up with the hash number that you expected and knew to be the number identified with the computer file that was yours, that you created or you had otherwise seen, then you would say yes, I can identify that as the computer file that I created or whatever.
So you would be doing it by instant Hashing and by the way, that is not far-fetched because Hashing is incredibly fast. I mean, we’re talking about millionths of a second to Hash a typical file, and there are programs identified in my law review article where I refer to a couple of websites where you can download Hashing software for free. Anybody can do this now and it’s kind of an interesting thing to do, to see what the Hash values are and to do that right away. It really is the best way to verify that nobody has messed with your computer file, maybe changed a sentence here on purpose or maybe just accidentally, it’s been slightly corrupted unintentionally.
MARY: Would you suggest agreeing on the contours of what would be included in the Hash at the meet and confer? For example, while whole files or disks would hash identically with the same hash tool, email needs consistent definition of what fields are included, how dates and names are represented and how exceptions like “no subjects” are handled.
RALPH: Well, you might want to, especially as to the email. It would also be nice to pick a particular type of Hash MD5, SHA-2 or whatever; there’s a little bit of variety to consider — the many flavors of Hash. <No laughs? I thought it was funny. Guess she’s still pissed.> Most companies, though, are still using either MD5 or SHA-2. That would be good to know. And it would be good to select a naming system. For instance, to agree that you are going to put that Hash in your actual production if you do identify files and then that way two years later, we can make sure that that file hasn’t changed by again looking at its truncated name and comparing it to its current Hash value when you run a Hash check on it.
MARY: And we’ve got a couple of questions on if there should be a standard Hash tool and if you have a recommendation. A standard Hash tool for both sides since the results could be different if one algorithm is used.
RALPH: Well, I think all e-discovery vendors – and correct me if you think I’m wrong Mary – but I think all e-Discovery vendors are Hashing files automatically as part of the service. Everybody does that. <Gees, I can’t believe I just said that. I sure hope Fios doesn’t charge extra for hashing. If they do Mary will really be pissed.> So this is already common practice to include Hash as a metadata that you are creating and that you are producing along with the native files or even with loads that have been stripped. It is already, I think, standard practice. <Oh no. I said it again.> It would be nice if the industry would pick a particular variety of Hash, but I don’t think it matters that much. There’s really only two primary ones now anyhow and it’s a simple matter to rehash things on a different type if you need to.
Book Talk
MARY: So, Ralph, moving on to your books <finally!> how are you feeling about having your books cited by Judge Facciola?
RALPH: Well, that was a big honor. You know, that’s what we all write this stuff about. Same thing with law review articles and books — you want to be read, but you really want to be cited. <Boy that sounds vain.> You know, it’s not just me, it’s anybody that takes their time to do legal writing. <Hey, its hard and you get paid zip.> So, yes I was very happy to see that. I’m happier still that it is being warmly received by the e-discovery community, even though I’m fairly new as being a person that’s sort of out there, talking about it, but people have been very receptive and very complimentary about it and I certainly appreciate that. <Thank God everybody is generally nicer than I am.>
MARY: It’s an unusual book, isn’t it, in that it is a collection of blog posts that are published by the ABA, which is usually more conservative?
RALPH: Yeah, can you believe that? <Oops, too honest.> No – and both West and Lexis, they were all interested in it, but little old ABA got it together and said, you know, we want to do it and we’re going to do it fast, because that was one of my requirements. There’s nothing to getting something electronically published, but with some paper publishers, it takes six months or more for publication. You know, that just wasn’t acceptable to me. So, the ABA said they’d do it quick and they did. They did the first book very quickly. We got it to print in less than three months. They did it again with the second book.
Also, the ABA was kind enough to allow me to keep all of this material online too with the electronic rights, so that you can look at most of the content of these books for free. <Oh no, that was dumb. I’ll be hearing from the ABA now for sure.> I would prefer that you still buy the books <back pedaling fast now>, just to help support the ABA in their efforts, but you can see pretty much the same material by looking at my blog where I have kept most of these same posts, blogs online. <Oh no, I did it again.> The book is a little bit different than what you will find on the blog, in that I’ve edited it and improved it a little bit, but it’s pretty much the same material only organized by subject matter as opposed to — in the blog, of course, it’s just chronological, according to whatever is what’s up. You know, whatever is going on lately, like, you know, the latest thing now is that crazy case involving The Secret movie and attorney-client privilege.
Tolkien’s Evil Eye
MARY: Oh, your favorite case on The Secret. Exactly. Those of you who do read Ralph’s book or considering it, should go to the blog, because in addition to the analysis that he has, he also has wonderful links and wherever he can, he gets source material, so it’s quite a read and he will go through and analyze cases unlike – on a consistent basis – unlike most bloggers. Shall we move onto the trial lawyers?
RALPH: <No. Please. Not trial lawyers. You are really going to get me in trouble.> Well, before we do that, I have something that might make it a little fun for the listeners here. <Change subject. Maybe she’lI forget the question.> I challenge you to take a look at my latest blog and see if you can identify the eye on the top of the pyramid and tell me where I got that eye from because it’s not just any eye, it’s a rather famous eye. So if you can identify the eye, why don’t you let us know by putting it in the form of a question and we will recognize the person providing the first correct answer over the air as a truly savvy culture-knowledgeable person.
MARY: Now there’s a question. <Good. She’s forgetting already.>
RALPH: I like to ask questions back. It’s a lot of fun to do these things, CLEs, but it’s even more fun to teach in law school, which I’ve been doing now for – since the beginning of the year, because you get to ask questions. You spend a lot of time asking questions and of course, in law school, you can force them to answer. I can’t force any of you folks to answer, but I’m asking you a lot easier questions, too.
MARY: Well, we’ve got – we’ve got three answers and the first one, we’ve got is the Lord of the Rings eye. Is that right, Ralph?
RALPH: That is right, but which eye is it in the Lord of the Rings? <Like there is more than one, but lets stretch this part out and maybe she’ll forget the boring trial lawyer critiques.>
MARY: Ohhhhhhhh.
RALPH: The first answer is very good. Danny K., What is the Eye of Sauron? Very good Danny K.. And so – we got a lot of good answers here. Yeah, that’s the evil eye – Eye of Sauron. The one Tolkien made famous. So thanks a lot.
Now, what were you – what did you want to ask about next here? <Please. Anything but trial lawyers.>
Everybody’s Talking About Trial Lawyers
MARY: Oh, man. After that, I don’t know. <Yeah!> I think we were moving into the trial lawyers and your very provocative “Flat Earth Society Admits World is Round and Wants to Learn to Circumnavigate” blog post. <Oh no. She’s got a good memory. I’m screwed now.> Want to talk about some of the proposals from the trial lawyers to limit e-discovery?
RALPH: <No, not really.> Yes, and two more articles I wrote on this subject: Why E-Discovery is Ruining Litigation in America and What Can Be Done About It and Trial Lawyers Turn a Blind Eye to the True Cause of the e-Discovery Morass. One, two, and three there on the slide are the names of the three different articles that I wrote on this subject. <Hey, if you guys would just read the blogs I would not have to talk about it.> The first one, Trial Lawyers Turned a Blind Eye, I wrote after I saw the preliminary report produced by the Academy of Trial Lawyers, which is a very esteemed group of trial lawyers from around the country. You have to be nominated. <I’ll never get in now, especially after writing those articles.> It’s limited to only a couple per state. And these men and women really are excellent trial lawyers, highly skilled in jury trial. Most of the members are older than I am even, and I’m, you know, 57 I think. <Good God, I am not even sure how old I am.> Most of these fellows are in their 60s and maybe in their 70s and they are very polished at what they do. <Which, believe me, has nothing whatsoever to do with computers and e-discovery.> But they did a survey of these lawyers as to what they think is wrong with litigation – civil litigation in America today. And all of them – not all of them – but a vast majority of them, pointed the finger at poor little e-discovery and said it’s electronic discovery that’s to blame for the high cost of litigation today. <Much like me blaming all of the problems on poor jury voir dire, whatever that is.>
So this preliminary report, which heaped blame on e-discovery, precipitated my first article. I pointed out from a careful study of their published survey of Academy members, that the same lawyers who were criticizing electronic discovery – putting the blame primarily upon the new rules as causing all the problems, the federal rules, and then secondarily upon the judges who aren’t really doing their job right — had never even had an e-discovery issue. So they were doing this criticism all – I don’t know – through rumor, innuendo. They’d heard about other people having e-discovery problems, because most of these men and women had never been involved in e-discovery at all. They had focused on other aspects of litigation, like actually trying cases. <Maybe that last comment will placate them.>
So the Trial Lawyers preliminary report precipitated a response on my part <ok, it was a rant>, the first one where I basically pointed out to them that the true cause of the issues with e-discovery was a competency question, in that people mess up e-discovery when they don’t know what they’re doing. That is just like anything. If you had me pick a jury tomorrow, I would probably mess it up, because I – yeah, I think I’ve only done that a couple of times in my life. Whereas members of the American College of Trial Lawyers, they would do a super job picking a jury, but they’ve probably never done e-discovery. So it – you know, it’s a matter of what you’re trained to do, where you’ve spent your time. In the first article, I pointed out to everyone what was going on with the American College of Trial Lawyers, their preliminary report, and why I thought they were wrong, so that was the first one.
My article was read by some of the members of the College of Trial Lawyers, and in fact, later on I even got an e-mail from somebody connected with them saying that they were fans of my blog. <Maybe it was a form email?> So there was at least some people in the group that heard what I said. But before I got that feedback, I wrote another article of why e-discovery – this was tongue-in-cheek – is ruining litigation in America, again sort of taking their point of view and ridiculing it a little bit. Why E-Discovery is Ruining Litigation in America and What Can Be Done About It.
Then there was the final article that I just wrote this year, 2009, Flat Earth Society Admits World is Round and Wants to Learn to Circumnavigate, I wrote right after the trial lawyers wrote their final report. Before it had just been a preliminary report where they welcomed comments. In their final report, they really made some, I think, significant changes in the wording insofar as e-discovery is concerned, which is why I sort of declared victory here.
Flat earth society people who hadn’t seen e-discovery before and believed that the world was paper, they now admit in their final report that electronic discovery is very important. And they want to learn how to go around this new round earth. They want to learn how to do e-discovery and so recommended that there be training in e-discovery. Of course, they started off primarily saying training for judges, but now also for trial lawyers to learn about e discovery. I think they’ve made a significant change and the question now is will they follow through with that and actually spend the time necessary to learn what I think is an important – and they now agree – is an important part of litigation? <I know I am not interested in getting jury voir dire training, so why should they take the time for e-discovery training? Of course, I don’t try to pick a jury. I leave that to my partners who are experts at that sort of thing. So why can’t they?>
Defending the 2006 e-Discovery Rule Amendments
MARY: So, Ralph, they’ve made some strides on the e-discovery, but they also have some pretty radical proposals around changing the Federal Rules of Civil Procedure, don’t they?
RALPH: Well, they’re – that’s not really in this report, but there are talks going on that they’re unhappy with the rules and they may seek to provide input to the committee. <You don’t see me going around trying to get rid of jury trials because I don’t understand them. Jeech!> I think this is – well I know it is based upon they’re not really understanding the new rules or having them explained to them properly. <Be polite.> Because when you understand the new rules, you see that they help curb abuses with electronic discovery that might otherwise occur and that they help protect people from over-expensive or over-burdensome e-discovery.
Regardless of whether there were new rules or not, there would still be e-discovery because even before the new rules were enacted, there still was discovery of electronic documents. There’s discovery of electronic documents in every state in the union. The difference in the federal system is that you have some rules that help provide some greater certainty and protection. Whereas in the states that don’t have rules, such as my state in Florida, you’re left with less protection and greater uncertainty as to what the court might do than you’d have in the federal system.
MORE TO COME: Stay Tuned for Part Two of the grilling by Mary Mack where I talk about “Angry Ostriches, Gladwell’s 10,000 Hour Rule, Jack Nicholson, Pretend Lawyers, Volunteers for America, and a Tad More.”
[…] Mary Mack Grills Me on Hash, Nervous Bates Stampers, Trial Lawyers … […]
Thank you SO much for this!
By revealing the beautiful minds of people [the IT crowd ] who really enjoy the intersection of computers and the law, you’ve made it easier for the rest of us to hold somewhat normal conversations with our less-dorky brethren. [ sheepish grin ]. REALLY look forward to meeting you!
On hashing: I think the distinction between hashing a collection and hashing a file is at least a little [ really freakin’ ]important.
The main benefit of hashing a collection is to look for modification. It’s a way of answering the question : “are ye still the way I left ye?” It doesn’t tell you ANYTHING about where a file has been or who had access to it.
My guess is that collection-level chain of custody will lose relevancy with cloud computing, because what is important is not the location of the file (or pieces of it), but the contents of the file and the identification of people who had access or exercised dominion (e.g. by modifying it). A collection hash won’t tell you much about the integrity of the original collection, just whether or not it has changed.
A file hash is beautiful because it can follow a file through infinite litigation matters. If a company starts early, it can eventually get doc review to the point of only doing “diffs” and then lawyers will have increased (and most importantly) *trackable* legal metadata around a file.
Here’s my thought for the day:
I don’t know if “differential hashing” (for lack of a better phrase) is technically possible without being hackable, but the cool factor would be off the charts. Imagine being able to algorithmically designate a lead document and then auto-hash derivatively off of that based on conceptual similarity! [ Yum! ]
The hash part (not the conceptual part) is relatively “easy” and can be done on the corporate server level without employee involvement. Further, it would be a simple matter to provide separate Bates-hashes in the event [ pigs fly and ] two different docs from two different firms carried the exact same hash [ flip that coin 340 billion billion…times ]. It would take at most a few days to work through several TB of data and could be used to mark the inevitable porn and personal stuff employees insist on making doc reviewers look at time and time again.
Can’t see any reason why courts wouldn’t find that an awesome proposition. And it would work it’s way right through the Socha EDRM model all the way to court where… :
Attorney : “Yer Honor, I’d like to show the witness Exhibit # 7d7c4a4c485735546f315536634b464a405c31474a7e573e6577397068547979″…
Opposing Counsel : “Wait! Do you mean Exhibit # 7d7c4a4c485735546C315536634b464a405c31474a7e573e6577397068547979″…
??”
Attorney : “No. I mean Exhibit # 7d7c4a4c485735546f315536634b464a405c31474a7e573e6577397068547979! Counsel is well aware that Exhibit # 7d7c4a4c485735546C315536634b464a405c31474a7e573e657739″ was ruled inadmissible, which means the jury won’t get to hear about it. I said Exhibit # 7d7c4a4c485735546f315536634b464a405c31474a7e573e657739 and that’s what I meant!”
Jury : “ZZZZZzzzzzzz”
Solution : “Tinyurl” them and use a bar code, no problem. It’ll make it easy to “show” opposing counsel the document (using the bar code) and the judge can then flip a switch making it appear on the jury’s monitor (think of it as a judicial video game) once the admission ruling has been made.
[…] I was interviewed this week by Mary Mack, the Corporate Technology Counsel for Fios. People that heard the webinar seemed to like it, so this week’s blog is Part One of my off-hand remarks. Naturally it is slightly edited, visually enhanced, hyperlinked, with secret thoughts to self revealed. Since this project came in at over 9,000 words, I decided to break it up into two parts. There is only so much of my big mouth <and secret thoughts> anyone can tolerate in one sitting. If you want the hear the banter for yourself, you can download the original audio interview as a free Fios webinar. Thanks to Mary for the not-so-difficult grilling and to ever-pleasant Debbie Caldwell at Fios who set it up. No doubt this example of what it is like to talk to me for an hour about e-discovery will discourage anyone else from attempting such a conversation. Continue reading…. […]
Are there any plans to offer an RSS feed for this site? I find it very informative and amusing.
[…] or some of them at least, that occurred to me during this interview, much like I did with the Mary Mack interview. I will skip over the interview introductions and get right to the […]
[…] that make me look good. Also, as I have done before in such interviews, most famously in the brutal Mark Mack interview, I once again share a few of my <Secret Thoughts> to try to make the reading […]
Very engaging topic. It’s long though but enough to have a little understadning of the logic behind eDiscovery.
[…] to Jackson Lewis and Ralph Losey as he joins the firm tomorrow. My secret thought is that Jackson Lewis will emerge as an ediscovery powerhouse as Ralph takes the reins of their […]