User tools

Website tools


re-indexing

This is an old version of the document!


Index update and reindexing of the archive

If an already archived email is imported again, this is detected using duplicate detection. In this case, the email is not archived again. The archived email is also not modified. Instead, the email to be imported is analyzed using the tokenizer and re-indexed; the index entry for the email in question is deleted and replaced with the updated index data for that email.

This approach can be used, for example, if the tokenizer supports a previously unknown file format (attachment type). The contents of attachments that could not be indexed due to unknown file formats can then be easily re-indexed, making emails with these attachments easier to find.

(Note: As of the end of 2017, the Tokenizer supports approximately 1,300 different file formats).

Re-indexing a container

To re-index a container, all emails must be read and indexed again.

To completely re-index a container, it is necessary to re-import all emails from the repository. For the re-import, the emails are transferred from the repository to the Inbox directory.

Re-import

The emails are stored in the repository directory as zipped files in an extended RFC822 format. The files in the archive contain a special, Benno MailArchiv-specific header with internal metadata, as well as the actual email.

Für den Re-Index muss zuerst der Benno-spezifische Header entfernt werden. Die Mail ist dann wieder dem Original entsprechend. Danach muss die E-Mail "atomar" in die Inbox geschrieben werden.

Der Re-Imports stellen kann mit dem Programm benno-bennobox2eml durchgeführt werden. Dieses durchsucht das Repo-Verzeichnis nach den Archivdateien, extrahiert die Header und schreibt die E-Mails atomar in das Inbox-Verzeichnis.

benno-bennobox2eml

The program is included in the benno-import-tools .

Usage: /usr/sbin/benno-bennobox2eml [-h] [-d] [-v] [-m<num> ] -a<archive directory> [-e <export directory] -a archive (repo) directory -b boxstate file (default $archive_dir/boxstate.xml) -e export files to directory (default current archive directory) -d dry run -s skip defective marked mails -m <num>max files in inbox directory -v verbose -V print version

Example call:

sudo -u benno ./benno-bennobox2eml -a /srv/benno/archive/repo -e /srv/benno/inbox
re-indexierung.1615383203.txt.gz · Zuletzt geändert: 2021/03/10 13:33 von lwsystems