Training SpamAssassin's Bayes filter with Proxmox Mail Gateway



NOTE: An updated script is available for finding mail in dovecot for Step 3 below. It uses doveadm and can be used with any mail storage backend.

One of the problems with bayes filters is that you need to train them on both ham and spam. As Proxmox Mail Gateway only uses the Bayes filter for messages that pass originally, there is no way to force it to learn spam - leaving a hole in how to train.

Here are the steps for adding that feedback loop for sa-learn.

1) On the PMG Server, create the following script as /root/bin/remote-commands, then chmod +x /root/bin/remote-commands to make it executable:

#!/bin/sh
case "$SSH_ORIGINAL_COMMAND" in
        report)
                sa-learn --spam
                ;;
        revoke)
                sa-learn --ham
                ;;
        *)
                echo "Invalid command?"
                ;;
esac

Configure the correct bayes setup in /etc/mail/spamassassin/custom.cf as follows:

use_bayes_rules 1
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam 0.5
bayes_auto_learn_threshold_spam 6
bayes_path /root/.spamassassin/bayes
bayes_file_mode 0775
bayes_auto_expire 1

2) Create an SSH Key, put the private part on the end mail server, then add the public part to /root/.ssh/authorized_keys and force it to use the restricted command:

command="/root/bin/remote-commands" ssh-rsa AAAA....rest-of-key... root@mail

You can further restrict this to a set of IP addresses by using the from= command as documented.

3) On the mail server, add the following script to /root/bin/spam-reporter. This assumes a number of things. The mail directories on the target system are listed as /mail/username in Maildir format. The end user IMAP mail directory will be "Spam". You can change these as required for your install. This handles multiple message formats that Dovecot uses - plain, gz or bz2 compressed. It could also be expanded if needed.

#!/bin/bash
MAILFILTER=<ip of PMG install>

for i in /mail/*/.Spam/cur/* /mail/*/.Spam/new/*; do
        if [ -f "$i" ]; then
                STATUS=`file "$i"`
                if [[ $STATUS == *"gzip"* ]]; then
                        gunzip -d -c "$i" > /tmp/tempmail.$$
                fi
                if [[ $STATUS == *"bzip2"* ]]; then
                        bzip2 -d -c "$i" > /tmp/tempmail.$$
                fi
                if [[ $STATUS == *"SMTP mail"* ]]; then
                        cp "$i" /tmp/tempmail.$$
                fi

                cat /tmp/tempmail.$$ | ssh root@$MAILFILTER report
                if [ $? != 0 ]; then
                        echo "Error running sa-learn. Aborting."
                        exit 1
                fi
                rm -f "$i"
                rm -f /tmp/tempmail.$$
        fi
done

4) If you're going to use SystemD's timer specs, create /etc/systemd/system/spam-reporter.service with the following:

[Unit]
Description=This service automatically reports spam.

[Service]
Type=oneshot
ExecStart=/root/bin/spam-reporter

Then the timer unit as /etc/systemd/system/spam-reporter.timer:

[Unit]
Description=This is the timer to check for spam and report it.

[Timer]
OnCalendar=*:0/5
Persistent=true

[Install]
WantedBy=timers.target

Then enable the timer with systemctl daemon-reload && systemctl enable spam-reporter.timer --now.

That's it! Now if your users throw mail in the Spam IMAP folder, it'll get fed back into PMG's bayes filter as spam.

Comments


Comments powered by Disqus