The Natural Language Toolkit (developerWorks)
Posted Jun 26, 2004 17:44 UTC (Sat) by
jwb (subscriber, #15467)
In reply to:
The Natural Language Toolkit (developerWorks) by iabervon
Parent article:
The Natural Language Toolkit (developerWorks)
You might be surprised how effective this technique is. By seeding with a single email or email thread, you can fairly effectively find clusters of similar emails in a mailbox. It isn't perfect, but it works, and if you allow iterative negative or positive user feedback, it converges on the desired behavior within one or two rounds.
I use rainbow for such work, and unfortunately it can take several hours to stem and lex my maildir :( Maybe in some ideal future someone will come out with an improved libbow.
(
Log in to post comments)