This is an old version of the document!
If an already archived email is imported again, this is detected using duplicate detection. In this case, the email is not archived again. The archived email is also not modified. Instead, the email to be imported is analyzed using the tokenizer and re-indexed; the index entry for the email in question is deleted and replaced with the updated index data for that email.
This approach can be used, for example, if the tokenizer supports a previously unknown file format (attachment type). The contents of attachments that could not be indexed due to unknown file formats can then be easily re-indexed, making emails with these attachments easier to find.
(Note: As of the end of 2017, the Tokenizer supports approximately 1,300 different file formats).
To re-index a container, all emails must be read and indexed again.
To completely re-index a container, it is necessary to re-import all emails from the repository. For the re-import, the emails are transferred from the repository to the Inbox directory.
Im Repository-Verzeichnis liegen die E-Mails als gezippte Dateien in einem erweiterten RFC822-Format. Die Dateien enthalten einen speziellen Header mit internen Metainformationen sowie die eigentliche E-Mail.
Für den Re-Index muss zuerst dieser Header abgetrennt werden. Danach muss die E-Mail "atomar" in die Inbox geschrieben werden.
Zur Vereinfachung des Imports stellen wir das Perl-Script benno-bennobox2eml zur Verfügung. Dieses durchsucht das Repo-Verzeichnis nach den Archivdateien, extrahiert die Header und schreibt die E-Mails atomar in das Inbox-Verzeichnis.
Usage: ./benno-bennobox2eml [-h] [-d] [-v] [-m <num>] -a <archive directory> [-e <export directory] -a archive directory -e export files to this directory (default current archive directory) -d dry run -m <num> max files in inbox directory -v verbose
Example call:
./benno-bennobox2eml -a /srv/benno/archive/repo -e /srv/benno/inbox