No, I’m not talking about Chinese food, I’m talking about an Outlook file format. Since Outlook email is at the heart of many, if not most, electronic document productions today, it is essential to understand some of the different file formats this software uses, especially MSG and PST. Otherwise you can easily fall into an expensive and confusing MSG e-discovery trap. So, just as you want to avoid MSG in your food, you want to avoid it in your e-discovery production too. This expanded revision of a prior blog entry will explain why.
On individual computers, Outlook can store emails in two different formats with two different file name extensions: MSG and PST. MSG stands for what you would think, “Message”. It is the file extension used to identify a single email message. The PST extension is a Microsoft speciality that stands for “Personal Storage Table.” It is used to identify all of the emails (with attachments) stored by one particular user. This is how almost everyone maintains and uses their personal Outlook email program. They keep their email in various folders, which all together make up one PST file. Indeed, this is the default procedure, although a user can (assuming no administrative restrictions) separate their emails, and scatter them all over their computer as many separate, unrelated emails. In this event, the different extension of MSG is used to store the individual emails.
The situation is different in a corporate or enterprise server environment, something which I did not mention when I first wrote about this. On Microsoft Outlook email servers, the emails of individual users in the server group are all stored together in a single container file, an electronic database file with the file extension EDB. (In Lotus email systems, it’s called NFS.) The individual users on the server do not have individual PST containers on the server, but the individual PST files can be easily created as a copy from the master EDB file on the server. This can be done at any time by the server administrator, or even by the individual users, unless this feature has been disabled.
This function of Outlook is frequently disabled for individual users because otherwise, users can create their own PST files on their hard drives, or even on their own portable storage devices, and the system administrator will never know about these backup files. This makes it very difficult to locate all emails in a large organization to find information, implement a legal hold, or collect responsive ESI. When there is a proliferation of unknown PST files, it is impossible to know if the EDB file is complete. This is because some emails in a user’s section of a master EDB file may have been deleted from the EDB file, but still remain on the previously generated PST file.
In a corporate server environment, to respond to a native file production request or preservation notice, the email of the individual users affected must be copied from the master container EDB file having the email of everyone, into individual PST files of the users whose email might be relevant. The PST files created from the master EDB file must be searched for relevant emails, and non-responsive and privileged emails deleted, and then a new responsive PST file reconstituted for production. (Also, in an environment where users have the ability to create their own PST files at any time, you must ask about and preserve/search through these other PST files as well.)
Craig Ball, an e-discovery expert with a deep understanding of forensics and technology, correctly points out that this is not a pure native production; that would require production of the original EDB container file. In his excellent article, Re-Burn of the Native, found at page 73 of Musings on Electronic Discovery, Craig calls it a “Quasi Native Production” and explains:
Chockablock [yes, he talks like that] as it is with non-responsive material, there are compelling reasons not to produce “the” source PST. But there’s no reason to refuse to produce responsive e-mails and attachments in the form of a PST file, so long as it’s clearly identified as a reconstituted file containing selected messages and the contents fairly reflect the responsive content and relevant metadata of the original. Absent a need for computer forensic analysis or exceptional circumstances, a properly constructed quasi-native production of e-mail is an entirely sufficient substitute for the native container file.
Craig goes on to say that the production does not have to be made in a PST file to be “Quasi Native,” it could also be produced as a MSG file. Although that is certainly true, as MSG is also native to Outlook, in my opinion, the PST format is typically preferred. To understand why it is helpful to use a paper file comparison.
Outlook, by default, keeps an individual user’s emails all together in a filing cabinet type structure. Received emails start in the Inbox folder. The user can then create various subfolders to file the emails for later easier access. It is equivalent to providing a filing cabinet to store paper letters, but with a virtually unlimited number of blank folders and cabinet space. Just as in a paper filing system, with Outlook (and most other email software, such as Lotus), you label the folders yourself, and file your emails in the folders you deem appropriate. This should result in some kind of rational record storage system that makes sense to the user and allows them to retrieve old letters/emails more easily. The folders’ names and ordering system often provide useful insights into the user’s thinking, and sometimes help to explain the meaning of a particular document. For instance, if a user created a folder called “Important”, their decision to place a particular document in that file tells you something about the document itself, or at least about the user’s attitude toward that document. So when you take a single email out of the Outlook folder, it is equivalent to removing it from a paper file folder and keeping it loose on your desk (or floor).
Parties today frequently specify the production of files in their Native format so that all metadata will be preserved. Indeed, most commentators agree that Native file production under the new rules, specifically Rule 34(b)(ii), is now the default mode of production absent agreement by the parties to the contrary. (There are, by the way, many good reasons to agree to non-native file production, so long as essential metadata is still preserved, pertaining to the advantages of loaded TIFF files and trial preparation software). Moreover, most believe that the primary purpose behind this rule specification is to preserve metadata. Rule 34(b)(ii) states:
(ii) if a request for electronically stored information does not specify the form or forms of production, a responding party must produce the information in a form or forms in which it is ordinarily maintained or in a form or forms that are reasonably usable.
An argument can be made that both types of Microsoft email files, individual MSG files and collective PST files, are “Native” files, since they are both produced and used by Outlook. In that sense, they are both native to that software. But it is the PST form in which almost everyone ordinarily maintains their individual Outlook emails, not the MSG form, and so, in my opinion, that is the form contemplated by the rule. (I rule out production of the original native file in an enterprise server environment, production of the EDB or NSF files, because they include all emails of all users in the enterprise, and that would almost never be relevant, and would otherwise be unwise, as Craig Ball explains well in his Re-Burn of the Native article.)
Rule 34 (b)(ii) also provides for production in a form alternative to the native “ordinarily maintained” form, by specifying that production can also be made in “forms that are reasonably usable.” Under the alternate “reasonably usable” form, flatted image files, or Rich Text Format (“RTF”) file production, may arguably suffice. But in my opinion, and Craig agrees, they are only “reasonably usable” if fully searchable and if paired with attachments. Moreover, if other metadata is needed in a particular case that is not shown in the image file, then this metadata should be preserved in a load file for the image files to be considered reasonably usable.
When parties have agreed to native production, and also to the preservation of metadata, then I suggest the situation is clear: that Outlook files should be produced in PST format, not MSG format. But before I complete the basis for this contention, further explanation of the terms might help. The Sedona Conference Glossary (2005) defines “native format” as follows:
Native Format: Electronic documents have an associated file structure defined by the original creating application. This file structure is referred to as the “native format” of the document.
Palgut v. City of Colorado Springs, 2006 WL 3483442 (D. Co. Nov. 29, 2006) (previously discussed in this blog) cites to the Judge’s Guide and defines “Native format” as:
“Native format” means all documents that are created in digital format (word processing files, spreadsheets, presentations, and E-mail) have a native file format – that is, a format designed specifically for the most efficient use of the information in which this kind of software specializes.
Outlook has designed the PST format for the most efficient use of the information it creates for individual users. That is why it is the default. True, it also has an alternative MSG format, but it is not the most efficient use of the information. The most efficient use is to keep all of the emails together, organized into different folders, the way the information was originally and ordinarily maintained. Further, when you take a single email, remove it from the PST file, and put into into a standalone MSG format, you are stripping it of a key piece of metadata.
When Outlook emails are converted from their original PST format to MSG format, the metadata that shows where the email was located in the custodian’s folders is usually lost. It is equivalent to taking a filing cabinet full of paper letters, wherein the correspondence is filed and placed in appropriate drawers, files, folders and sub-folders, and then dumping them out of the drawers and folders, into one big box of mixed-up, disorganized individual letters.
In short, you can see the original Outlook folder structure in a native format production of PST files, but can not and will not ever know this information in MSG format production. That makes review of the MSG production substantially more difficult and expensive than review of a PST production. Further, MSG production makes it impossible to determine what letters were originally filed together, and hides the file names created by the custodian to identify these folders. Thus, for instance, if a user created folders labeled “hot”, “unimportant”, and “bogus”, and then produced 100 emails from the unimportant folder, 20 from the bogus, and only 1 from the hot, this would no doubt lead to important deposition questioning.
So be wary of Outlook production in individual MSG files, which some parties may insist upon as less expensive than PST production. Instead, demand PST format. This is one of many items that savvy e-discovery lawyers will want to discuss in the initial meetings under new Rules 16 and 26.