Wednesday, July 23, 2003


Spam: Bayesian Throttling.

The thing about spam classifiers is, they make mistakes some times.  The good ones, like SpamBayes, don't make all that many mistakes.  But when you're making a back-end spam solution, you really can't afford to have any rejected emails.  You have to let everything through.  It has to let the user make the final decision.

So where can the mostly accurate nature of a Bayesian classifier help us out? We can use it to throttle back spam from a given source.

Most emails come routed to us through servers, but increasingly, we are receiving emails directly from the servers that are the culprits.  It's pretty straightforward to use Bayesian classification on the emails that are coming in; when we see the spam percentage is getting high from a given server, we throttle back substantially on what we'll allow that server to send us.  That way we can make it inefficient to be a target.

If the big services could do this, then we could make pretty good headway against spam.  It should be relatively simple to bind it into the various server environments.


7:57:37 PM