Bogofilter: A New Spam Filter

According to Linux Weekly News, Eric Raymond is writing a new spam filter called bogofilter based on Bayesian analysis, as suggested by Paul Graham. Unlike the excellent SpamAssasin, which merely requires whitelisting a small number of addresses, bogofilter requires training with around 1,000 e-mail messages. But bogofilter may ultimately offer more hope for defeating spam.

Once trained, bogofilter recognizes most incoming spam (allegedly as much as SpamAssassin, but we'll have to wait and see). More importantly, however, bogofilter is very good at not recognizing legitimate e-mail as spam (in other words, it has a very low false positive rate).

The secret strength of bogofilter, however, is the training process. Because bogofilter is trained by the user, each user gets a personalized spam filter. This means that (1) information of professional interest to the reader will generally be recognized as non-spam (however incriminating it might otherwise look), and (2) there won't be a centralized list of rules for the spammer to read.

I suspect that the new MacOS X 10.2 mail client may be using a similar technique.

More posts