Tuesday, August 15

my spam filter can beat up your spam filter

I have emails accounts on both Yahoo and Google (Gmail) and it's interesting to note how much better the Gmail Spam Filter is. It appears to be nearly 100% accurate in separating the Spam from the Ham, while Yahoo's filter is getting worse by the day - lots of stuff is seeping through lately.

Easily 75% of my email is now spam (on one mostly-dormant account, it's more like 99%), so it's important to trap as much of this stuff as possible. I've been training the Spamato extension to Thunderbird for a few weeks, and it's only moderately successful so far (remember that it's a second line of defense, since either Gmail or Yahoo has already tried to identify the stuff).

One of the puzzling things is that email from several large companies is consistently marked as Spam, when it's not. I gave them my address. Since some of the filters depend on how many other users "vote" if a certain message is spam-or-ham, that means there are a LOT of incredibly stupid users out there, who proclaimed an email to be Spam when it clearly is not. All they have to do is [unsubscribe]* and it goes away.
* yes, I know there are some hairball companies out there who don't tell you how to unsubscribe, or tell you that you're off their list when you're really not. I'm not talking about them. I'm talking about the large B&M (brick and mortar) operations that are reputable. When their messages are marked as Spam, something's wrong.
Spamato's interesting in that it has six (6) filters in this open source product: Bayesianato; Comha; Domainator; Earlgrey; Razor; Ruleminator. The product has a nice bar-chart which shows the effectiveness of these filters; in my case the "Ruleminator" is the most effective and the "Bayesianato" is the least effective. That said, there's a nice ability to add additional rules - if the spam that wasn't already trapped fits certain patterns. I haven't tweaked it yet, but that's coming Real Soon Now.

Today's mandatory read: Molly Ivins' latest column. As always, she's dead on.

No comments: