LWN.net Logo

Recoll: A search engine for the Linux desktop (Linux.com)

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 9:24 UTC (Tue) by oever (subscriber, #987)
In reply to: Recoll: A search engine for the Linux desktop (Linux.com) by drag
Parent article: Recoll: A search engine for the Linux desktop (Linux.com)

Nepomuk-KDE is a not a search tool, but a metadata framework for the next KDE version. It uses Strigi for extracting metadata, indexing and searching. Strigi can use different indexes. One can simply write an index backend and use that. Nepomuk-KDE has implemented an indexing backend for Strigi that uses an RDF store. Strigi itself has virually no dependencies (libz, libbz2, libxml2) in contrast to the other search engines.

The speed of data extraction for Strigi is unrivalled. It can extract data from deeply nested files, e.g. from a text file in a zip file attached to an email. This is because it is the only indexer that uses streambased fileanalysis. This speed is available to the other search engines too. All they need to do is use the 'xmlindexer' executable that comes with Strigi or link to the library 'libstreamanalyzer'.

KDE4 uses libstreamanalyzer to provide the desktop application with metadata. This ensures consistency between the data in the search index and the data shown in applications.


(Log in to post comments)

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 11:18 UTC (Tue) by drag (subscriber, #31333) [Link]

I wouldn't go around saying your the fastest this or the fastest that unless you actually are able to back it up with something.

Tracker for me is very fast. It has a unnoticable impact on even laptops when it's just started for the first time. After running for several days the thing is still using only 7MB rss.

Also it's FUSE friendly, which I like since I serve all my media files over sshfs. How does Strigi work on FUSE?

I donno. I'm willing to try anything and it seems like tracker development folks realy aren't that active, unfortunately.

I know that it's going to be a while before you get down to one or two engines that people will like overal.

Does anybody have any experiances with anything else?

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 11:42 UTC (Tue) by oever (subscriber, #987) [Link]

Here's a comparison. Note that it is somewhat outdated.

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 17:20 UTC (Tue) by eklitzke (subscriber, #36426) [Link]

That benchmark was released about a week before the latest tracker release, which was supposed to massively improve the speed of tracker. According to the tracker website, the indexing is _much_ faster now, and they claim to be able to index 100 files per second on ext3 (according to them, basically the maximum possible speed taking I/O time into account). This is more than twice as fast as Strigi in the benchmark you posted, although it goes without saying that the tests would need to be run on the same machine to really be comparable.

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 19:03 UTC (Tue) by superstoned (subscriber, #33164) [Link]

Still, does tracker have the deep indexing feature?

Anyway, numbers would be good. Maybe I can try to provide some, or LWN.net
will figure something out ;-)

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.