<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Random Hacks: Experimenting with NLTK</title>
    <link>http://www.randomhacks.net/articles/2009/12/28/experimenting-with-nltk</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Technology and Other Fun Stuff</description>
    <item>
      <title>Experimenting with NLTK</title>
      <description>&lt;p&gt;The &lt;a href="http://www.nltk.org/"&gt;Natural Language Toolkit&lt;/a&gt; for Python is a great framework for simple, non-probabilistic natural language processing. Here are some example snippets (and some trouble-shooting notes).&lt;/p&gt;

&lt;h3&gt;Concordances&lt;/h3&gt;

&lt;p&gt;We can search for &amp;#8220;dog&amp;#8221; in &lt;a href="http://www.gutenberg.org/etext/1695"&gt;Chesterton&amp;#8217;s &lt;em&gt;The Man Who Was Thursday&lt;/em&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;&amp;gt;&amp;gt;&amp;gt; from nltk.book import *
&amp;gt;&amp;gt;&amp;gt; text9.concordance(&amp;quot;dog&amp;quot;, width=40)
Displaying 4 of 4 matches:
ead of a cat or a dog , it could not ha
d you ever hear a dog bark like that ?&amp;quot;
aid , &amp;quot; is that a dog -- anybody ' s do
og -- anybody ' s dog ?&amp;quot; There broke up&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Synonyms and categories&lt;/h3&gt;

&lt;p&gt;We can use WordNet to look up synonyms:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;from nltk.corpus import wordnet

dog = wordnet.synset('dog.n.01')
print dog.lemma_names&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This prints:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;['dog', 'domestic_dog', 'Canis_familiaris']&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We can also look up the &amp;#8220;hypernyms&amp;#8221;, or larger categories that include the word &amp;#8220;dog&amp;#8221;:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;paths = dog.hypernym_paths()

def simple_path(path):
    return [s.lemmas[0].name for s in path]

for path in paths:
    print simple_path(path)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This prints:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_python "&gt;['entity', 'physical_entity', 'object',
 'whole', 'living_thing', 'organism',
 'animal', 'domestic_animal', 'dog']
['entity', 'physical_entity', 'object',
 'whole', 'living_thing', 'organism',
 'animal', 'chordate', 'vertebrate',
 'mammal', 'placental', 'carnivore',
 'canine', 'dog']&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For more neat examples, take a look at the &lt;a href="http://www.nltk.org/book"&gt;NLTK book&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Installation notes&lt;/h3&gt;

&lt;p&gt;While setting up NLTK, I bumped into a few problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; The &lt;code&gt;dispersion_plot&lt;/code&gt; function returns immediately without displaying anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; &lt;a href="http://matplotlib.sourceforge.net/users/shell.html#mpl-shell"&gt;Configure your matplotlib back-end correctly.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; The &lt;code&gt;nltk.app.concordance()&lt;/code&gt; GUI fails with the error:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_default "&gt;out of stack space (infinite loop?)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; &lt;a href="http://code.google.com/p/nltk/issues/detail?id=445"&gt;Recompile Tcl with threads.&lt;/a&gt; On the Mac:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_sh "&gt;sudo port install tcl +threads&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
      <pubDate>Mon, 28 Dec 2009 21:31:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:d23d8142-890b-4aed-ad8c-38b73ffc102a</guid>
      <author>Eric Kidd</author>
      <link>http://www.randomhacks.net/articles/2009/12/28/experimenting-with-nltk</link>
      <category>Python</category>
      <category>NLP</category>
      <trackback:ping>http://www.randomhacks.net/articles/trackback/738</trackback:ping>
    </item>
  </channel>
</rss>
