LWN.net Logo

Recoll: A search engine for the Linux desktop (Linux.com)

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 2:27 UTC (Tue) by drag (subscriber, #31333)
Parent article: Recoll: A search engine for the Linux desktop (Linux.com)

Wow. I had no idea.

Looks like we will have nearly as many desktop search tools as we do Window managers.

I found this, the freedesktop project to try to unify them. XesamAbout, previously known as WasabiAbout.
http://wiki.freedesktop.org/wiki/XesamAbout

It is working with:
Beagle, Tracker, Recoll, Pinot, Stringi, and Nepomuk-KDE..

Is there a technical reason why we have so many different search engines? It is just people trying different things or is just they decided to do their own in comparative isolation?

Just curious. I used beagle for a while, but now I enjoy the lightweight and fuse-friendliness of Tracker.


(Log in to post comments)

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 6:22 UTC (Tue) by Zero_Dogg (subscriber, #31310) [Link]

I agree, we need a common search backend (possibly with some simple plugin architecture so that additional filetypes and such can be added), and then we can just have different interfaces. One official per desktop environment (GNOME Desktop Search, KDE Desktop Search, XFce Desktop Search...) that just uses the same backend. That would probably improve the quality of the search tools available and reduce the amount of double work.

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 8:46 UTC (Tue) by khim (subscriber, #9252) [Link]

One backed makes sense when there are well-defined problem. Search is definitely not well-defined problem (why do we have Google, Yahoo, etc ?). Sure - if you have limited number of files to be searched... you don't need any search engine at all: grep is enough. If you have a lot of files - you have tough (and not well-defined) problem.

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 11:29 UTC (Tue) by jond (subscriber, #37669) [Link]

For me, having a choice of backends is important. I've tried and tried to use and like beagle, but it just keeps busting my machine. The last time I got OOM because of a rogue beagle indexing process was the final straw.

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 18:34 UTC (Tue) by superstoned (subscriber, #33164) [Link]

The strigi author started talks about a comon interface between search
systems, so apps can just use dbus to talk to whatever is running.

But I don't get what you want. You want 1 search engine, and all apps
using it? With a plugin interface? Lovely, but each of the current project
tries to BE that search engine. They all have a plugin interface, and try
to be a desktop-independend standard... So who's gonna have to abandon
their project? Are you going to tell them? And will they listen?

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 9:24 UTC (Tue) by oever (subscriber, #987) [Link]

Nepomuk-KDE is a not a search tool, but a metadata framework for the next KDE version. It uses Strigi for extracting metadata, indexing and searching. Strigi can use different indexes. One can simply write an index backend and use that. Nepomuk-KDE has implemented an indexing backend for Strigi that uses an RDF store. Strigi itself has virually no dependencies (libz, libbz2, libxml2) in contrast to the other search engines.

The speed of data extraction for Strigi is unrivalled. It can extract data from deeply nested files, e.g. from a text file in a zip file attached to an email. This is because it is the only indexer that uses streambased fileanalysis. This speed is available to the other search engines too. All they need to do is use the 'xmlindexer' executable that comes with Strigi or link to the library 'libstreamanalyzer'.

KDE4 uses libstreamanalyzer to provide the desktop application with metadata. This ensures consistency between the data in the search index and the data shown in applications.

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 11:18 UTC (Tue) by drag (subscriber, #31333) [Link]

I wouldn't go around saying your the fastest this or the fastest that unless you actually are able to back it up with something.

Tracker for me is very fast. It has a unnoticable impact on even laptops when it's just started for the first time. After running for several days the thing is still using only 7MB rss.

Also it's FUSE friendly, which I like since I serve all my media files over sshfs. How does Strigi work on FUSE?

I donno. I'm willing to try anything and it seems like tracker development folks realy aren't that active, unfortunately.

I know that it's going to be a while before you get down to one or two engines that people will like overal.

Does anybody have any experiances with anything else?

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 11:42 UTC (Tue) by oever (subscriber, #987) [Link]

Here's a comparison. Note that it is somewhat outdated.

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 17:20 UTC (Tue) by eklitzke (subscriber, #36426) [Link]

That benchmark was released about a week before the latest tracker release, which was supposed to massively improve the speed of tracker. According to the tracker website, the indexing is _much_ faster now, and they claim to be able to index 100 files per second on ext3 (according to them, basically the maximum possible speed taking I/O time into account). This is more than twice as fast as Strigi in the benchmark you posted, although it goes without saying that the tests would need to be run on the same machine to really be comparable.

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 19:03 UTC (Tue) by superstoned (subscriber, #33164) [Link]

Still, does tracker have the deep indexing feature?

Anyway, numbers would be good. Maybe I can try to provide some, or LWN.net
will figure something out ;-)

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 11:24 UTC (Tue) by akumria (subscriber, #7773) [Link]

I can feel a grumpy editor's guide to desktop searching coming on.

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 15:07 UTC (Tue) by job (subscriber, #670) [Link]

That would be interesting. And please include the proper unix tools Glimpse and Swish. I've used Glimspe now and then during the last ten years to index my home and these new insanely bloated graphical desktop utilities adds nothing, as far as I can tell. I'm not interested in metadata, it's all plaintext anyway.

Recoll: the search engine for the Linux desktop! :)

Posted Apr 25, 2007 14:32 UTC (Wed) by gvy (guest, #11981) [Link]

Glimpse has nice features like fuzzy search (see also agrep) but it's not free software; Swish-E has had its own problems (and rather stacks up against Xapian Omega or mnoGoSearch indexers which are primarily for web).

Didn't hear about Tracker before; I do maintain Xapian and Recoll packages for ALT Linux.

BTW search.gmane.org is powered by xapian.

Recoll: A search engine for the Linux desktop (Linux.com)

Posted Apr 24, 2007 20:39 UTC (Tue) by zorgan (guest, #4016) [Link]

I can feel a grumpy editor's guide to desktop searching coming on.

Yeah, I could really see our editor getting grumpy when he has beagle running on his laptop everytime he starts a session...

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds