My plan for spam – one month on

A while back I wrote about “my plan for spam“.

“My plan” has been running for a month now so it’s time to review it.

Before I implemented it I was getting around 600 spams a month.

After running it for a full month I’m down to around 250.

So, I’d consider that a fairly successful plan.

Of course, all the ones that were caught in my spam filter were being sent to my home account and put into a spam folder.

I said I’d monitor that and if there were no false positives I’d set it to delete them upon arrival.

There are no false positives, so I will be setting it to delete (need to test properly before I go putting delete rules into my filters).

As for getting that 250 even lower – I’m kinda stuck.

Around 80% of those 250 are direct to my gmail address so they don’t touch my filtering system.

I don’t really use my gmail address so I could set gmail to delete anything sent to that address.

The thought of doing that scares me a bit – I’ll wait and see how annoyed I get by it all I think.

Of course, no doubt in 6 months time I’ll be back up to 600 a month again, but what can you do?

Update:

I found out that if I tell gmail to delete an email in a filter rule it puts it into the Deleted Items folder which is automatically cleaned out after being in there for 30 days.

So, with that in mind I’ve set it to delete any emails that are addressed to my gmail address.

That way I have 30 days to find a real email if I have reason to believe it was sent to my gmail address.

Since doing that I’ve received an average of just over one spam a day!

That’s going to be around 40 a month.

Eat that spammers!

My plan for spam

Every now and again I get sufficiently annoyed by spam to want to do something about it.

Today was one of those days where I had enough time to sit down and work on my plan for dealing with spam.

My email setup is a little unusual so it became a bit complicated.

All my incoming email is processed on my mail server; then one copy is forwarded to gmail, the other to my computer at home.

I then read all my email via gmail, and keep the copy at home just for backup purposes.

I’ve been doing this for a while; but found the manual task of scanning through spam in gmail to be tiresome and annoying.

What I wanted to do was cut out a lot of the spam on my server, before it was even passed on to gmail.

I figured I could script a better spam-filtering solution than gmail’s system.

My plan was to install bogofilter on the mail server and use that to filter out a lot of the spam.

There were two problems with this, one was that I needed a body of spam and ham emails to train it with and I didn’t keep email on the server.

The other was that it would need constant training as new emails came in, this was very tricky as emails were passsed on to gmail and my home computer in an automated process with no scope for human intervention.

To deal with the first problem I decided to install bogofilter at home too and train it there, then upload the training database to the mail server.

For the second problem I came up with the following solution:

I would use bogofilter on the mail server; send anything flagged as non-spam on to gmail and home; and send anything classed as spam just to home.

Once it got to home, it would be passed through bogofilter a second time; this instance would be set up slightly differently to the first one; it would classify emails into one of three folders; ham, spam or unsure.

I would then use mutt to periodically re-train bogofilter telling it that anything in the “unsure” folder was either ham or spam.

Finally, the newly trained database would be copied back up to the mail server each night.

The more astute reader may have noticed a problem with this solution.

I seem to have replaced scanning a folder full of spam on gmail, for scanning a folder full of spam at home.

This is true, initially I will be dealing with the same amount of spam.

However, I have a longer term plan here.

Once I’m happy with the filtering I’m going to tweak my solution so that anything tagged as spam will be deleted outright once it hits my home PC.

This I expect will reduce the amount of spam that I see by about 90%.

I am comfortable with the fact that I will probably lose the occasional non-spam email.

I’m gonna run this system for about a month, if I get through that with zero false positives I’ll feel brave enough to set it to delete.

Trackback spam

I got hit by a comment spam attack last night.

Woke up this morning to find 6 adverts for online casinos littered all over my site.

They were actually trackbacks rather than comments.

This on the very day I’d added a recent comments feature to my home page (it’s like they knew).

After deleting and IP address banning (the latter pointless I expect) I started to think about ways to block them automatically.

Sadly, although I can spot them a mile off I can’t program my computer to do the same.

It’s the same problem as writing an email spam filter.

I’ve largely been comment spam free until now, I assume this is because I am not using Movable Type so what works for MT has no effect on my blog.

But now I guess the spammers have decided to smarten up their act.

I assume they are reading the XML stuff I have on every page that publishes the trackback URL in a specific format to enable auto discovery.

Anyway, they started again just now.

I had a good look at the request headers and spotted that they had just enough stuff in there that was different from normal trackbacks (I don’t want to reveal what it was of course).

So, I added a few lines to my Apache config to block all posts that have this specific set of headers.

Seems to have stopped them for now.

Once they figure out a way around that though, I’m kinda stuffed. 🙁

Blocking evil spammer scum

Lately my site has been repeatedly attacked by some scumbag(s) looking for mail scripts that they can use to send out spam.

Naturally my site doesn’t have such a vulnerability but every one of their requests triggers a 404 which sends me an email.

As a typical attack involves 30 or 40 requests in the space of a minute the email bombardment right narks me.

So, I’d like to be able to block them at the Apache level so I don’t get a 404 email.

However, it’s a distributed attack and they use a MSIE user agent so I can’t block them that easily.

The requests do however have the following in common;

They are all POSTS and they all set the referrer to my home page.

I don’t have any forms that use a POST from my home page so I can block those requests and not affect anyone but these lousy spammers.

A quick scan of the Apache docs and I came up with this:

RewriteCond %{REQUEST_METHOD} POST

RewriteCond %{HTTP_REFERER} boncey.org/$

RewriteRule ^.* – [F]

The $ at the end of the second line is vital, it means only match on URLs that end in boncey.org/.

Without the $ I’d block all POSTs on my entire site and I don’t want that. 🙂

Man, I hate spammers.

spamassassin, procmail, grep yada yada yada

Following on from yesterday’s housekeeping I updated my spam filters today.

I use spamassassin in combination with procmail which generally rocks.

It features Bayesian Filtering which is described neatly in A Plan For Spam.

The only problem is it requires occasional manual training.

This involves going through my spam mail folder and checking that everything that was automatically tagged as spam by spamassassin was actually spam, and then “teaching” spamassassin’s Bayesian filter to regard it all as spam.

The more this is done the better spamassassin becomes at recognising spam.

The converse of this is that I have to run it over my inbox from time to time to tell it that my inbox isn’t spam.

As I checked each spam I noticed some of the scores that spamassassin had allocated them.

I flag any email with a score over 5 as spam, but I saw some as high as 20.

I became curious as to the highest so I grepped my spam folder and found out the highest was a score of 29.20 points!

I looked at the mail that triggered such a high score and it read like a spammer’s guide to annoying people.

It was selling software to enable me to spam people more effectively, how ironic.

For the geek curious, the command line I used to find out the score was:

grep "^Content analysis details" ~/Maildir/misc/spam/cur/* | grep -o "[[:digit:]]*\.[[:digit:]]* points" | sort -u -n

grep -o is the dog’s nads by the way. 🙂