This is an old version of the document!
If emails are delivered to Benno MailArchiv multiple times for archiving, Benno MailArchiv's duplicate detection feature will be activated.
During archiving, Benno MailArchiv generates a SHA256 checksum for each archived email. This is logged internally in the Benno MailArchiv journal and used for consistency checks (compliance).
Die Checksumme wird jeweils über die gesamte E-Mail erzeugt. So kann direkt beim Archivieren einer E-Mail geprüft werden, ob eine Mail mit gleicher Prüfsumme evtl. bereits im Archiv vorhanden ist. Eine etwaige gleiche Checksumme würde bedeuten, dass die zu archivierende Mail bereits im Archiv abgelegt wurde, also eine Doublette wäre. Die Archivierung der betreffenden Mail würde in diesem Fall abgebrochen und das Erkennen des Duplikats entsprechend im Archivierungs-Logfile /var/log/benno/archive.log protokolliert. Zusätzlich wird der nicht erfolgte Archivierungsvorgang des Duplikats im Journal (/srv/benno/archive/repo/{yyyy}/journal/current.journal) mit Prüfsumme und dem Hinweis „DUPLICATE“ protokolliert.
Thanks to this effective duplicate detection, emails can be submitted to Benno MailArchiv for archiving as often as desired. They are reliably identified as duplicates, and their archiving is aborted, ensuring that each email is only ever archived once. The SHA256 checksum virtually eliminates the possibility of emails ever being mistakenly identified as duplicates and thus erroneously not archived. Since the entire email is used to generate the checksum, this means that, regarding the consistency check of archived emails or the consistency check of the entire archive, even a single bit change in an archived email is sufficient to alter the email's checksum and thus indicate email/archive corruption.
While emails typically (and especially in on-premises installations) reach the mail archive via a single, uniform path, in complex environments (e.g., in larger hosting infrastructures) emails may be transported to the archive multiple times and simultaneously via different paths. For example, different MTAs or transport methods and types (SMTP, IMAP, etc.) could be responsible for this.
In diesem Fall bietet sich die Konfiguration einer vereinfachten Checksumme über einzelne Header der E-Mail an.
In a highly complex infrastructure, Benno MailArchiv receives a specific email "M" for archiving via three different routes. Because each copy of the email is transported via these different routes, the email itself (from the user's perspective in the mail client – i.e., in terms of text and content) is the same. However, in the email copies (in the part typically not visible to the user), individual and different email headers have been inserted for each copy due to the different transport routes.
The three emails (which appear identical from the user's perspective) are, according to Benno MailArchiv's duplicate detection, three different emails: Generating the SHA256 checksum for each of the three emails yields the checksums "C1", "C2", and "C3". In terms of content (textually and from the user's perspective), the emails appear identical. However, due to the different headers of each email, Benno MailArchiv logically and correctly identifies them as three different emails (because of the different checksums). Benno MailArchiv would therefore archive the three emails as three separate emails. From the user's perspective, they would thus be visible three times in the archive, as they would be found and displayed using the same search criteria based on the email body.
To achieve a suitable implementation in this situation (archiving each email only once), the following scenario is recommended:
An email is uniquely identified by the headers listed below and additionally by the body (message text) of the email:
Envelope-From - X-REAL-MAILFROM Envelope-To - X-REAL-RCPTTO Return-Path Subject Message-Id Date From To Cc Body
Two emails, M1 and M2, which do not differ with respect to the aforementioned characteristics, are identical in terms of content and sender/recipient assignment. If any of these fields differ, the two emails, M1 and M2, are not identical.
Other email headers, such as Received, DKIM signatures, etc., are not directly related to the email's content. These headers are more a part of an email's envelope (similar to stamps and sticky notes on a contract being processed by a company).
Based on this situation, the checksum calculation would be performed in two ways. First, the standard checksum required for compliance policies would be generated for the entire email (as before). Simultaneously, a second checksum would be generated exclusively for the portion of the email specified above. This would easily enable duplicate detection for emails that appear identical from the user's perspective.
According to the German principles of proper accounting (GoBD), every email must be able to be restored from the archive in its original state (i.e., the email including all headers, attachments, etc.). Furthermore, every email must be verifiable for any manipulation, which is achieved using the standard checksum.
However, if several emails (identical in content and text) M1, M2 and M3 (different copies of the same email in the above sense) arrive for archiving, how should these be handled with regard to their different headers?
According to the information available to us, there is no legal obligation to archive multiple versions of an email that differ only in their headers. Nevertheless, from a pragmatic point of view, all copies of the email in question (M1, M2, M3, etc.) should be archived. From a purely formal standpoint (and immediately verifiable technically based on the different checksums), these are de facto different emails. Therefore, for legal certainty, all versions of the email should be archived. Technically, using two checksums—that is, the simplified duplicate detection described above—it would be easy to archive only the first of the email copies with identical content.
In order to implement a legally compliant solution for the operator, we advise discussing the matter with a legal advisor of your choice before implementation and only then deciding on and implementing the specific form of duplicate detection.
Wir gehen bis auf weiteres davon aus, dass es rechtlich ausreichend sein könnte, die vereinfachte Doubletten- bzw. Dupplikatss-Erkennung anzuwenden und damit nur eines von mehrfach anlandenden E-Mail-Exemplaren zu archivieren. Wir gehen außerdem und bis auf weiteres davon aus, dass eine geeignete Erklärung bzw. Niederschrift des Sachverhalts in der (dank GoBD obligatorischen) Verfahrensdokumentation ausreichend sein dürfte, um zu einer rechtssicheren Archivierung zu gelangen.
Die Entscheidung über die Art der angewendeten Doubletten-Erkennung und damit verbunden die Verantwortung gegenüber der Finanzverwaltung obliegt einzig und allein dem Betreiber.
This document does not constitute legal advice. It serves only for general information purposes. We assume no liability for the accuracy or completeness of the information provided. All liability is excluded.