New Advances in Enterprise Software for Next Generation Content Archiving

Date:   Tuesday , February 03, 2009

The commercially available email archive solutions perform the basic task of collecting email and storing them in an indexed database for fast search and retrieval. The data collection methods used by email archival solutions determine the breadth of information retained and the impact on Exchange Servers.

Challenges of Traditional Email Archiving Applications

At a basic level, commercially available email archiving applications provide an indexed database to catalog records, search and retrieve them, and apply security. Archiving applications reduce the burden of managing old email and allow users to access the archive for search and retrieval instead of querying the messaging server. Typical comparisons of email archiving products delve into the details of storage, search, and security and pay little attention to what method is used to collect data from the messaging servers. The data collection methods vary by the richness of message information they collect and the extent to which they impact Exchange performance.

Simple Mail Transfer Protocol (SMTP) Data Collection is a favorite method of email archival service providers. SMTP Data Collection intercepts email at the gateway server, while the message is in SMTP format. This method has the advantage of collecting email at a single point. And because SMTP is compatible with all message servers, it supports all the popular email types (e.g., Microsoft Exchange and IBM Lotus Notes). All data collection processing is performed on the gateway and does not burden the messaging server.

The drawback of SMTP Data Collection is that it is either limited to inbound or outbound messages only or it forces all internal messages also to be relayed through the gateway, thereby increasing the load on the messaging server. SMTP also contains only basic information for date, time, from, to, cc, subject, body, and attachment. Beyond this basic content information, there exists rich information that relates to the context and lifecycle of the message, such as folder location, flags, rich text, settings, replies, forwards, edits, opens, deletes, and folder changes.

Microsoft Message API (MAPI) is the Microsoft supported method for reading messages in Microsoft Exchange Server. Traditional email archive solutions use MAPI to scan messages in the Message Store, copy the messages to the archive, and index them for fast search and retrieval.

The major drawback of MAPI is the performance burden it places on the Exchange Server. To lessen this burden, email archiving solutions restrict the amount of information collected for each message. Each message contains its header, body, and attachment (if any), and over 400 individual properties that contain the message context and lifecycle information of each message. Because of the time MAPI takes to read the message content, the message properties are not collected. This reduces the overall burden of MAPI on the Exchange Server but limits the amount of information available for regulatory and legal discovery analysis.
The second method that is used to limit the burden on the Exchange Server is to schedule the archiving application to run on the weekend, when email use is low. The drawback of this scheduled approach is that messages sent, received, and deleted during the week are not available for archiving. Only messages that exceed a certain age (e.g., 60 days) and still remain in mailboxes are copied to the archive server. Organizations that must retain ‘all email’ for compliance and run MAPI Data Collection on a weekly basis use Exchange Journaling to capture email traffic during the week.

Microsoft Exchange Journaling archives all messages sent and received on any given Message Store. The drawback of Journaling is the CPU and storage burden it puts on the Exchange. Journaling captures one additional copy for each message sent or received and forces internal messages to go through the expensive route that outbound messages take even if the recipients are on the same server. To manage Journaling for compliance, email archiving solutions use MAPI to read the Journal mailboxes and truncate their contents. Without MAPI, the Journaling mailboxes would quickly fill and consume valuable Exchange storage space.
The second drawback of Journaling is its lack of rich message information. The Journal mailbox is a stripped down version that bears no similarity to the assigned mailbox except for the content of the messages. Journaling only views the message when it is sent or received; otherwise it loses all contact with the message and its history. Journaling is not able to capture the rich lifecycle information contained in the user’s mailbox (open, replies, forwards, etc.) — information that are extremely valuable to internal investigations and eDiscovery.
Moving Forward
The next generation of archiving applications needs to address the drawbacks of the traditional approaches described above, provides cost effective, efficient archiving that retains the rich context and lifecycle information of each message; complies with regulatory requirements; enables legal staff and investigators to easily search and review message content and enforce litigation holds; and minimize the storage burden.

The author is Director - Product Marketing, Mimosa Systems