This is an old version of the document!
If an already archived email is imported again, this is detected using duplicate detection. In this case, the email is not archived again. The archived email is also not modified. Instead, the email to be imported is analyzed using the tokenizer and re-indexed; the index entry for the email in question is deleted and replaced with the updated index data for that email.
This approach can be used, for example, if the tokenizer supports a previously unknown file format (attachment type). The contents of attachments that could not be indexed due to unknown file formats can then be easily re-indexed, making emails with these attachments easier to find.
(Note: As of the end of 2017, the Tokenizer supports approximately 1,300 different file formats).
To re-index a container, all emails must be read and indexed again.
To completely re-index a container, it is necessary to re-import all emails from the repository. For the re-import, the emails are transferred from the repository to the Inbox directory.
The emails are stored in the repository directory as zipped files in an extended RFC822 format. The files in the archive contain a special, Benno MailArchiv-specific header with internal metadata, as well as the actual email.
Für den Re-Index muss zuerst der Benno-spezifische Header entfernt werden. Die Mail ist dann wieder dem Original entsprechend. Danach muss die E-Mail "atomar" in die Inbox geschrieben werden.
Zur Vereinfachung des Imports stellen wir das Perl-Script benno-bennobox2eml zur Verfügung. Dieses durchsucht das Repo-Verzeichnis nach den Archivdateien, extrahiert die Header und schreibt die E-Mails atomar in das Inbox-Verzeichnis.
Usage: ./benno-bennobox2eml [-h] [-d] [-v] [-m <num>] -a <archive directory> [-e <export directory] -a archive directory -e export files to this directory (default current archive directory) -d dry run -m <num> max files in inbox directory -v verbose
Example call:
./benno-bennobox2eml -a /srv/benno/archive/repo -e /srv/benno/inbox