LWN.net Logo

Advertisement

Your IT equipment is safe and secure with us. We are everything you should expect in a data center, and more.

Advertise here

The Natural Language Toolkit (developerWorks)

The Natural Language Toolkit (developerWorks)

Posted Jun 26, 2004 17:44 UTC (Sat) by jwb (subscriber, #15467)
In reply to: The Natural Language Toolkit (developerWorks) by iabervon
Parent article: The Natural Language Toolkit (developerWorks)

You might be surprised how effective this technique is. By seeding with a single email or email thread, you can fairly effectively find clusters of similar emails in a mailbox. It isn't perfect, but it works, and if you allow iterative negative or positive user feedback, it converges on the desired behavior within one or two rounds.

I use rainbow for such work, and unfortunately it can take several hours to stem and lex my maildir :( Maybe in some ideal future someone will come out with an improved libbow.


(Log in to post comments)

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds