Training SpamAssassin's Bayes filter with Proxmox Mail Gateway
NOTE: An updated script is available for finding mail in dovecot for Step 3 below. It uses doveadm
and can be used with any mail storage backend.
One of the problems with bayes filters is that you need to train them on both ham and spam. As Proxmox Mail Gateway only uses the Bayes filter for messages that pass originally, there is no way to force it to learn spam - leaving a hole in how to train.
Here are the steps for adding that feedback loop for sa-learn
.
1) On the PMG Server, create the following script as /root/bin/remote-commands
, then chmod +x /root/bin/remote-commands
to make it executable:
1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/sh
case "$SSH_ORIGINAL_COMMAND" in
report)
sa-learn --spam
;;
revoke)
sa-learn --ham
;;
*)
echo "Invalid command?"
;;
esac
Configure the correct bayes setup in /etc/mail/spamassassin/custom.cf
as follows:
1
2
3
4
5
6
7
use_bayes_rules 1
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam 0.5
bayes_auto_learn_threshold_spam 6
bayes_path /root/.spamassassin/bayes
bayes_file_mode 0775
bayes_auto_expire 1
2) Create an SSH Key, put the private part on the end mail server, then add the public part to /root/.ssh/authorized_keys
and force it to use the restricted command:
1
command="/root/bin/remote-commands" ssh-rsa AAAA....rest-of-key... root@mail
You can further restrict this to a set of IP addresses by using the from=
command as documented.
3) On the mail server, add the following script to /root/bin/spam-reporter
. This assumes a number of things. The mail directories on the target system are listed as /mail/username
in Maildir format. The end user IMAP mail directory will be “Spam”. You can change these as required for your install. This handles multiple message formats that Dovecot uses - plain, gz or bz2 compressed. It could also be expanded if needed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/bin/bash
MAILFILTER=<ip of PMG install>
for i in /mail/*/.Spam/cur/* /mail/*/.Spam/new/*; do
if [ -f "$i" ]; then
STATUS=`file "$i"`
if [[ $STATUS == *"gzip"* ]]; then
gunzip -d -c "$i" > /tmp/tempmail.$$
fi
if [[ $STATUS == *"bzip2"* ]]; then
bzip2 -d -c "$i" > /tmp/tempmail.$$
fi
if [[ $STATUS == *"SMTP mail"* ]]; then
cp "$i" /tmp/tempmail.$$
fi
cat /tmp/tempmail.$$ | ssh root@$MAILFILTER report
if [ $? != 0 ]; then
echo "Error running sa-learn. Aborting."
exit 1
fi
rm -f "$i"
rm -f /tmp/tempmail.$$
fi
done
4) If you’re going to use SystemD’s timer specs, create /etc/systemd/system/spam-reporter.service
with the following:
1
2
3
4
5
6
[Unit]
Description=This service automatically reports spam.
[Service]
Type=oneshot
ExecStart=/root/bin/spam-reporter
Then the timer unit as /etc/systemd/system/spam-reporter.timer
:
1
2
3
4
5
6
7
8
9
[Unit]
Description=This is the timer to check for spam and report it.
[Timer]
OnCalendar=*:0/5
Persistent=true
[Install]
WantedBy=timers.target
Then enable the timer with systemctl daemon-reload && systemctl enable spam-reporter.timer --now
.
That’s it! Now if your users throw mail in the Spam IMAP folder, it’ll get fed back into PMG’s bayes filter as spam.
Comments powered by Disqus.