SpamAssassin: it works

Category: Linux

7:42 PM, Mon, Oct 29 2007

The problem: Chiral Software has had a domain name and website since we incorporated in November of 2003. My email address has been listed on the website from the very beginning. Spammers use web crawlers to harvest email addresses from websites. Like all websites, Chiral Software has been crawled repeatedly since its launch, and all of our publicly-listed email addresses have been getting increasing spam over the years.

Finally, spam rates went over 99% and picking out the real emails had become time-consuming. Despite arrests, convictions, and even the murder of a "spam king", spam is growing every day. A spam blocker became essential, so we could be sure of getting emails and communicating with customers.

The Apache SpamAssassin Project is a well-regarded spam-fighting tool. First, installation. I won't go over how I did it because that is covered in detail on the website. I chose site installation, so that everyone here would be protected without having to do anything. SpamAssassin integrates fairly easily with Postfix. It is necessary to download the software, write a small shell script, and do some configuration. All in all, quite painless for someone experienced in dealing with Postfix and Linux. However, dealing with spam is such an essential part of email today that I think it is time for the Mail Transfer Agents (MTAs, such as Postfix) to agree on a spam blocking module system, much like various web servers use the Common Gateway Interface (CGI) so that software written in any language can interface easily with many different web servers. If MTAs had a Antispam Gateway Interface (AGI), it would be even easier to install SpamAssassin or others. Spammers are constantly adapting their techniques to get around spam filters, so the more flexibility we have with anti-spam tools, the quicker we can respond.

If there were an Antispam Gateway Interface (AGI) it would need, at a minimum, interface or callback points such as:

Hopefully some type of standard anti-spam software interface will be developed by the major players in that field. And now back to SpamAssassin.

SpamAssassin comes with some basic rule-based filtering capabilities, and a flexible plug-in system. The basic rule-based filtering lets the site administrator block based on various attributes of the messages themselves. Pharmaceuticals ads, male enhancement ads, too much HTML in the message, known-spam mail agents, and misformed emails are some of the parameters it looks for. I installed it and turned on these default rules. Doing so cut the spam by about 50%. This was helpful but I want to reduce spam to a small handful of messages per day at most, so I had a ways to go.

Next I turned on the Bayesian filter module. This required using a corpus of spam and a corpus of non-spam to train the filter. This helped some, but I was still getting about 60 to 100 spams per day, and my objective was to get that number into the low single digits.

Upon the advice of Alan, I turned on the network tests and activated Razor. Again there was some installation and configuration involved, but it all went smoothly, and it was a success. Now spam rates are in the low single digits per day. There haven't been any false positives yet. This is with quite conservative settings, with a spam points threshold of 8, instead of the default 5. I've won back my email.

I would recommend SpamAssassin for any site that runs its own email. You'll get some benefit without using the network tests, but to win back control of your email, you must activate the network tests.