SpamAssassin: it works
7:42 PM, Mon, Oct 29 2007
The problem: Chiral Software has had a domain name and website since we incorporated in November of 2003. My email address has been listed on the website from the very beginning. Spammers use web crawlers to harvest email addresses from websites. Like all websites, Chiral Software has been crawled repeatedly since its launch, and all of our publicly-listed email addresses have been getting increasing spam over the years.
Finally, spam rates went over 99% and picking out the real emails had become time-consuming. Despite arrests, convictions, and even the murder of a "spam king", spam is growing every day. A spam blocker became essential, so we could be sure of getting emails and communicating with customers.
The Apache SpamAssassin Project is a well-regarded spam-fighting tool. First, installation. I won't go over how I did it because that is covered in detail on the website. I chose site installation, so that everyone here would be protected without having to do anything. SpamAssassin integrates fairly easily with Postfix. It is necessary to download the software, write a small shell script, and do some configuration. All in all, quite painless for someone experienced in dealing with Postfix and Linux. However, dealing with spam is such an essential part of email today that I think it is time for the Mail Transfer Agents (MTAs, such as Postfix) to agree on a spam blocking module system, much like various web servers use the Common Gateway Interface (CGI) so that software written in any language can interface easily with many different web servers. If MTAs had a Antispam Gateway Interface (AGI), it would be even easier to install SpamAssassin or others. Spammers are constantly adapting their techniques to get around spam filters, so the more flexibility we have with anti-spam tools, the quicker we can respond.
If there were an Antispam Gateway Interface (AGI) it would need, at a minimum, interface or callback points such as:
- Callback upon incoming SMTP connection, before even saying 'helo', so it could do things like reject connections based on IP, or counter-attack based on IP. A typical IP-based counter-attack would be to abandon the connection, but leave it hung open, so the spam-sending machine gets blocked.
- Callback before the message body is received
- Callback with the message headers and body, which could reject it, or accept it, and could take other actions such as adding a hash code or doing a counter-attack
- Callbacks for relaying. ISPs would want to use this to prevent their own customers from sending spam, usually as botnet zombies.
- Ability to do chaining, much like the Apache Httpd Module interface, or the Servlet Filter interface. Chaining would allow more than one anti-spam tool to work on a message, optionally modifying the message, passing it on to the next tool in the chain, or terminating the chain.
Hopefully some type of standard anti-spam software interface will be developed by the major players in that field. And now back to SpamAssassin.
SpamAssassin comes with some basic rule-based filtering capabilities, and a flexible plug-in system. The basic rule-based filtering lets the site administrator block based on various attributes of the messages themselves. Pharmaceuticals ads, male enhancement ads, too much HTML in the message, known-spam mail agents, and misformed emails are some of the parameters it looks for. I installed it and turned on these default rules. Doing so cut the spam by about 50%. This was helpful but I want to reduce spam to a small handful of messages per day at most, so I had a ways to go.
Next I turned on the Bayesian filter module. This required using a corpus of spam and a corpus of non-spam to train the filter. This helped some, but I was still getting about 60 to 100 spams per day, and my objective was to get that number into the low single digits.
Upon the advice of Alan, I turned on the network tests and activated Razor. Again there was some installation and configuration involved, but it all went smoothly, and it was a success. Now spam rates are in the low single digits per day. There haven't been any false positives yet. This is with quite conservative settings, with a spam points threshold of 8, instead of the default 5. I've won back my email.
I would recommend SpamAssassin for any site that runs its own email. You'll get some benefit without using the network tests, but to win back control of your email, you must activate the network tests.