Spam: A Very Quick Discussion

by Russ Smith

This is a very quick reference to setting up server-side spam filtering on CSX. See the full document for a thorough discussion.

SpamAssassin

cp /pub/htdocs/spamassassin.ex .procmailrc should be all that's required to get SpamAssassin going. The effect is that all mail presumed by SpamAssassin to be spam is filed in a folder named Spam instead of the user's inbox. (By default, the folder is under the directory ~/mail.)

Spamprobe

Generally, there are a few stages to implementing Spamprobe:

  1. Collect spam messages for several days, ideally refiling them into a folder (hereafter called Spam.)
  2. Run spamprobe for the first time to prime it with a large number (the larger the better) of known spam and known non-spam messages. Let's assume that a folder named Saved contains non-spam messages and that your mail folders are under the directory mail (but don't assume this is correct in practice); you would do something like this:
    1. spamprobe -c good mail/Saved
    2. spamprobe -c spam mail/Spam
  3. cp /pub/htdocs/spamprobe.ex .procmailrc next. This gives you similar results to SpamAssassin above, but also continues to train Spamprobe.
  4. If the message is a false positive or false negative you will need to manually train spamprobe to improve its behavior. Save the message as a file (hereafter called File), and do spamprobe train-spam File (if it's a false negative; if it's a false positive -- far less common -- use train-good instead.)
  5. Run spamprobe cleanup periodically -- at least daily is vital to keep disk usage down. (Failure to do so that causes disk overuse may result in deletion of spamprobe databases, requiring starting training over.) One way to automate this would be to run crontab -e and add a line like this:
    0 1 * * * /usr/bin/spamprobe cleanup
    which runs spamprobe cleanup every morning at 1am.

Questions about these or other approaches are always welcome at the usual system manager address.