<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Random Hacks: Tag Python</title>
    <link>http://www.randomhacks.net/articles/tag/Python?tag=Python</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Technology and Other Fun Stuff</description>
    <item>
      <title>Visualizing WordNet relationships as graphs</title>
      <description>&lt;p&gt;The &lt;a href="http://wordnet.princeton.edu/"&gt;WordNet&lt;/a&gt; database contains all sorts of interesting relationships between words: it can categorize words into hierarchies, find the parts of an object, and answer many other interesting questions.&lt;/p&gt;

&lt;p&gt;The code below relies on the &lt;a href="http://www.nltk.org/"&gt;NLTK&lt;/a&gt; and &lt;a href="http://networkx.lanl.gov/"&gt;NetworkX&lt;/a&gt; libraries for Python.&lt;/p&gt;

&lt;h3&gt;Categorizing words&lt;/h3&gt;

&lt;p&gt;What, exactly, is a dog? It&amp;#8217;s a domestic animal and a carnivore, not to mention a physical entity (as opposed to an abstract entity, such as an idea). WordNet knows all these facts:&lt;/p&gt;

&lt;p&gt;&lt;a href="/files/dog.png"&gt;&lt;img src="/files/dog.png" width="406" height="306" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How do we generate this image? First, we look up the first entry for &amp;#8220;dog&amp;#8221; in WordNet. This returns a &amp;#8220;synset&amp;#8221;, or a set of words with equivalent meanings.&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;dog = wn.synset('dog.n.01')&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Next, we compute the &lt;a href="http://en.wikipedia.org/wiki/Transitive_closure"&gt;transitive closure&lt;/a&gt; of the &lt;a href="http://en.wikipedia.org/wiki/Hyponymy"&gt;hypernym&lt;/a&gt; relationship, or (in English) we look for all the categories to which &amp;#8220;dog&amp;#8221; belongs, and all the categories to which &lt;em&gt;those&lt;/em&gt; categories belong, recursively:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;graph = closure_graph(dog,
                      lambda s: s.hypernyms())&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After that, we just pass the resulting graph to &lt;a href="http://networkx.lanl.gov/"&gt;NetworkX&lt;/a&gt; for display:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;nx.draw_graphviz(graph)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;The implementation&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;closure_graph&lt;/code&gt; function repeatedly calls &lt;code&gt;fn&lt;/code&gt; on the supplied symset, and uses the result to build a &lt;a href="http://networkx.lanl.gov/"&gt;NetworkX&lt;/a&gt; graph. This code goes at the top of the file, so you can use &lt;code&gt;wn&lt;/code&gt; and &lt;code&gt;nx&lt;/code&gt; in your own code.&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;from nltk.corpus import wordnet as wn
import networkx as nx

def closure_graph(synset, fn):
    seen = set()
    graph = nx.DiGraph()

    def recurse(s):
        if not s in seen:
            seen.add(s)
            graph.add_node(s.name)
            for s1 in fn(s):
                graph.add_node(s1.name)
                graph.add_edge(s.name, s1.name)
                recurse(s1)

    recurse(synset)
    return graph&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;By using a high-quality graph library, we make it much easier to merge, analyze and display our graphs.&lt;/p&gt;

&lt;h3&gt;More graphs&lt;/h3&gt;

&lt;p&gt;Parts of the finger, generated with &lt;code&gt;synset('finger.n.01')&lt;/code&gt; and &lt;code&gt;part_meronyms&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="/files/wn_finger.png"&gt;&lt;img src="/files/wn_finger.png" width="406" height="306" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Types of running, generated with &lt;code&gt;synset('run.v.01')&lt;/code&gt; and &lt;code&gt;hyponyms&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="/files/wn_run.png"&gt;&lt;img src="/files/wn_run.png" width="406" height="306" /&gt;&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Tue, 29 Dec 2009 20:38:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:bf20469d-bdce-4636-a7f1-33579f49b54c</guid>
      <author>Eric Kidd</author>
      <link>http://www.randomhacks.net/articles/2009/12/29/visualizing-wordnet-relationships-as-graphs</link>
      <category>Python</category>
      <category>NLP</category>
      <trackback:ping>http://www.randomhacks.net/articles/trackback/739</trackback:ping>
    </item>
    <item>
      <title>Experimenting with NLTK</title>
      <description>&lt;p&gt;The &lt;a href="http://www.nltk.org/"&gt;Natural Language Toolkit&lt;/a&gt; for Python is a great framework for simple, non-probabilistic natural language processing. Here are some example snippets (and some trouble-shooting notes).&lt;/p&gt;

&lt;h3&gt;Concordances&lt;/h3&gt;

&lt;p&gt;We can search for &amp;#8220;dog&amp;#8221; in &lt;a href="http://www.gutenberg.org/etext/1695"&gt;Chesterton&amp;#8217;s &lt;em&gt;The Man Who Was Thursday&lt;/em&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;&amp;gt;&amp;gt;&amp;gt; from nltk.book import *
&amp;gt;&amp;gt;&amp;gt; text9.concordance(&amp;quot;dog&amp;quot;, width=40)
Displaying 4 of 4 matches:
ead of a cat or a dog , it could not ha
d you ever hear a dog bark like that ?&amp;quot;
aid , &amp;quot; is that a dog -- anybody ' s do
og -- anybody ' s dog ?&amp;quot; There broke up&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Synonyms and categories&lt;/h3&gt;

&lt;p&gt;We can use WordNet to look up synonyms:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;from nltk.corpus import wordnet

dog = wordnet.synset('dog.n.01')
print dog.lemma_names&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This prints:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;['dog', 'domestic_dog', 'Canis_familiaris']&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We can also look up the &amp;#8220;hypernyms&amp;#8221;, or larger categories that include the word &amp;#8220;dog&amp;#8221;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;paths = dog.hypernym_paths()

def simple_path(path):
    return [s.lemmas[0].name for s in path]

for path in paths:
    print simple_path(path)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This prints:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;['entity', 'physical_entity', 'object',
 'whole', 'living_thing', 'organism',
 'animal', 'domestic_animal', 'dog']
['entity', 'physical_entity', 'object',
 'whole', 'living_thing', 'organism',
 'animal', 'chordate', 'vertebrate',
 'mammal', 'placental', 'carnivore',
 'canine', 'dog']&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For more neat examples, take a look at the &lt;a href="http://www.nltk.org/book"&gt;NLTK book&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Installation notes&lt;/h3&gt;

&lt;p&gt;While setting up NLTK, I bumped into a few problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; The &lt;code&gt;dispersion_plot&lt;/code&gt; function returns immediately without displaying anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; &lt;a href="http://matplotlib.sourceforge.net/users/shell.html#mpl-shell"&gt;Configure your matplotlib back-end correctly.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; The &lt;code&gt;nltk.app.concordance()&lt;/code&gt; GUI fails with the error:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;out of stack space (infinite loop?)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; &lt;a href="http://code.google.com/p/nltk/issues/detail?id=445"&gt;Recompile Tcl with threads.&lt;/a&gt; On the Mac:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_sh "&gt;sudo port install tcl +threads&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 28 Dec 2009 21:31:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:d23d8142-890b-4aed-ad8c-38b73ffc102a</guid>
      <author>Eric Kidd</author>
      <link>http://www.randomhacks.net/articles/2009/12/28/experimenting-with-nltk</link>
      <category>Python</category>
      <category>NLP</category>
      <trackback:ping>http://www.randomhacks.net/articles/trackback/738</trackback:ping>
    </item>
    <item>
      <title>Interesting Python libraries for natural language processing</title>
      <description>&lt;p&gt;I&amp;#8217;ve been looking at various libraries for natural language processing, and I&amp;#8217;m pleasantly surprised by the tools created by the Python community. Some examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Python &lt;a href="http://www.nltk.org/"&gt;NLTK&lt;/a&gt; library provides parsers for many popular copora, visualization tools, and a wide variety of simple natural language algorithms (though few of these are probabilistic). Highlights include:
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://wordnet.princeton.edu/"&gt;WordNet&lt;/a&gt; support.&lt;/li&gt;
&lt;li&gt;NumPy integration (see below).&lt;/li&gt;
&lt;li&gt;An accessible &lt;a href="http://www.nltk.org/book"&gt;introductory book on natural language processing&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://conceptnet.media.mit.edu/"&gt;ConceptNet&lt;/a&gt; provides a simple semantic model of the world.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://numpy.scipy.org/"&gt;NumPy&lt;/a&gt; (and &lt;a href="http://www.scipy.org/"&gt;SciPy&lt;/a&gt;) provide extensive support for linear algebra and data visualization.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://mathema.tician.de/software/pycuda"&gt;PyCUDA&lt;/a&gt; provides access to Nvidia GPUs for high-performance scientific computation, and it integrates with NumPy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you need to build a web crawler, there&amp;#8217;s &lt;a href="http://twistedmatrix.com/trac/"&gt;Twisted&lt;/a&gt;, which makes it easy to write fast, asynchronous networking code.&lt;/p&gt;

&lt;p&gt;All in all, I usually prefer Ruby to Python, because I love &lt;a href="http://www.randomhacks.net/articles/2005/12/03/why-ruby-is-an-acceptable-lisp"&gt;Ruby&amp;#8217;s metaprogramming support&lt;/a&gt;. But the Python community has built an impressive variety of scientific and linguistic tools. Many thanks to everybody who contributed to these projects!&lt;/p&gt;</description>
      <pubDate>Mon, 28 Dec 2009 15:56:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:41c8c423-bcb5-4d98-bb64-ede7ca42792f</guid>
      <author>Eric Kidd</author>
      <link>http://www.randomhacks.net/articles/2009/12/28/interesting-python-libraries-for-natural-language-processing</link>
      <category>Python</category>
      <category>NLP</category>
      <trackback:ping>http://www.randomhacks.net/articles/trackback/737</trackback:ping>
    </item>
    <item>
      <title>Bayesian Whitelisting: Finding the Good Mail Among the Spam</title>
      <description>    &lt;p&gt;The biggest challenge with spam filtering is reducing false
    positives--that is, finding the good mail among the spam.  Even the
    best spam filters occasionally mistake legitimate e-mail for spam.  For
    example, in some &lt;a href='/stories/2002/09/22/trainable-spam-filter-testing' title='How To Test a Trainable Spam Filter'&gt;recent
    tests&lt;/a&gt;, &lt;a href='http://bogofilter.sourceforge.net/'&gt;&lt;code&gt;bogofilter&lt;/code&gt;&lt;/a&gt;
    processed 18,000 e-mails with only 34 false positives.  Unfortunately,
    several of these false positives were urgent e-mails from former
    clients.  This unpleasant mistake wasn't necessary--the most important
    of these false positives could have been avoided with an automatic
    whitelisting system.&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.randomhacks.net/articles/2002/09/29/bayesian-whitelisting"&gt;Read More&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Sun, 29 Sep 2002 00:00:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:4e3b83e6-1f2a-48d4-9f65-5e691ab45838</guid>
      <author>Eric</author>
      <link>http://www.randomhacks.net/articles/2002/09/29/bayesian-whitelisting</link>
      <category>Spam</category>
      <category>Hacks</category>
      <category>Python</category>
      <category>Recommended</category>
      <category>Probability</category>
      <trackback:ping>http://www.randomhacks.net/articles/trackback/38</trackback:ping>
    </item>
  </channel>
</rss>

