LWN.net Logo

The Natural Language Toolkit (developerWorks)

The Natural Language Toolkit (developerWorks)

Posted Jun 26, 2004 0:29 UTC (Sat) by iabervon (subscriber, #722)
In reply to: The Natural Language Toolkit (developerWorks) by cpeterso
Parent article: The Natural Language Toolkit (developerWorks)

I don't remember a good example, but the most of the programming work is
already done in the remembrance agent, available (GPL) from
http://www.remem.org/, which indexes a corpus and then can efficiently
query it. The interface only really supports finding the documents which
match best, but the engine has the words and their scores internally.
It's not a lot of computation provided you do an initial scan and then
update it incrementally; using the stored info is too fast to tell on my
machine. (Note that what it displays for a match are what matched, not
the top words for the document in general)

Querying for one of the emails in the list gives, for it's match with
itself: "window, throwing, knife, pizza, stuck". I'm not sure whether
this is only completely clear to me due to remembering the email, or
whether other people could also figure out what the email was about from
that information.


(Log in to post comments)

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds