The Natural Language Toolkit (developerWorks)
Posted Jun 25, 2004 20:08 UTC (Fri) by
iabervon (subscriber, #722)
Parent article:
The Natural Language Toolkit (developerWorks)
One thing I've been thinking someone should write for ages now is a mail client plug-in to list mail messages by top 5 porter-stemmed words by term-frequency-inverse-document-frequency. When I was playing with automatic topic extraction, I found that this feature was quite distinctive, and often more useful than the subject provided by the sender.
Now I miss reading words like "whatev" and "relat". Also, the singular of "corpora" is "corpus", not "corporum".
(
Log in to post comments)