<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Random Hacks: Bogofilter: A New Spam Filter</title>
    <link>http://www.randomhacks.net/articles/2002/09/13/bogofilter</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Technology and Other Fun Stuff</description>
    <item>
      <title>Bogofilter: A New Spam Filter</title>
      <description>    &lt;p&gt;According to &lt;a href='http://lwn.net/Articles/9185/'&gt;Linux Weekly
    News&lt;/a&gt;, Eric Raymond is writing a new spam filter called &lt;a href='http://www.tuxedo.org/~esr/bogofilter/'&gt;&lt;code&gt;bogofilter&lt;/code&gt;&lt;/a&gt;
    based on &lt;a href='http://mathworld.wolfram.com/BayesianAnalysis.html'&gt;Bayesian
    analysis&lt;/a&gt;, as &lt;a href='http://www.paulgraham.com/spam.html'&gt;suggested&lt;/a&gt; by Paul
    Graham.  Unlike the excellent &lt;a href='/stories/2002/08/06/spam-assassin-intro' title='SpamAssassin: An Decent Spam Filter'&gt;SpamAssasin&lt;/a&gt;, which merely requires
    whitelisting a small number of addresses, &lt;code&gt;bogofilter&lt;/code&gt; requires
    training with around 1,000 e-mail messages.  But &lt;code&gt;bogofilter&lt;/code&gt; may
    ultimately offer more hope for defeating spam.&lt;/p&gt;

    &lt;p&gt;Once trained, &lt;code&gt;bogofilter&lt;/code&gt; recognizes most incoming spam
    (allegedly as much as SpamAssassin, but we'll have to wait and see).
    More importantly, however, &lt;code&gt;bogofilter&lt;/code&gt; is very good at &lt;i&gt;not&lt;/i&gt;
    recognizing legitimate e-mail as spam (in other words, it has a very
    low false positive rate).&lt;/p&gt;

    &lt;p&gt;The secret strength of &lt;code&gt;bogofilter&lt;/code&gt;, however, is the training
    process.  Because bogofilter is trained by the user, each user gets a
    personalized spam filter.  This means that (1) information of
    professional interest to the reader will generally be recognized as
    non-spam (however incriminating it might otherwise look), and (2) there
    won't be a &lt;a href='http://spamassassin.org/tests.html'&gt;centralized
    list of rules&lt;/a&gt; for the spammer to read.&lt;/p&gt;

    &lt;p&gt;I suspect that the new &lt;a href='http://www.apple.com/macosx/jaguar/mail.html'&gt;MacOS X 10.2 mail
    client&lt;/a&gt; may be using a similar technique.&lt;/p&gt;</description>
      <pubDate>Fri, 13 Sep 2002 00:00:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:826ec767-f2d8-4171-a27e-cc245718888d</guid>
      <author>Eric</author>
      <link>http://www.randomhacks.net/articles/2002/09/13/bogofilter</link>
      <category>Spam</category>
      <trackback:ping>http://www.randomhacks.net/articles/trackback/26</trackback:ping>
    </item>
  </channel>
</rss>
