LWN.net Logo

PHP Search Engine Showdown (O'ReillyNet)

Here's an O'ReillyNet article about the benefits of adding a search engine to your website. "When you choose to incorporate a local search service, you install the search engine on your server and customize the tool yourself. The advantages of using the local approach are that you can ensure the privacy of your data, you can control the indexing process and search results, and that you have the freedom to implement new features. The disadvantages of installing a local search engine are that indexing and maintenance is your responsibility, and that the index and installation files will use space on your hard drive. You may also incur costs associated with software acquisition--although free, open source software is available."
(Log in to post comments)

PHP Search Engine Showdown (O'ReillyNet)

Posted Mar 23, 2007 16:54 UTC (Fri) by jwb (subscriber, #15467) [Link]

Weird article. Unless you have some ideological attachment to PHP, you'd be crazy to use anything other than Lucene for search. Lucene is flexible and very fast and has a powerful query language.

http://lucene.apache.org/

Lucene?

Posted Mar 23, 2007 17:03 UTC (Fri) by rfunk (subscriber, #4054) [Link]

Seems to me that Lucene only makes sense if you're already using Java on your server.
And plenty of people have an ideological or practical opposition to running Java on their
web server, no matter their feelings about PHP.

Lucene?

Posted Mar 23, 2007 22:19 UTC (Fri) by osma (subscriber, #6912) [Link]

Lucene indeed is great.

FWIW, there are other implementations of Lucene than the Java version. I've used PyLucene but it's kind of a weird hack (it compiles the original Java code into native code using gcj, and uses that as a Python module) and has its limitations - you can only use the Lucene functionality for which Python wrappers have been written, unless you want to write your own glue, which isn't so nice.

Some of the implementations are complete rewrites with no Java code in sight, but they are generally compatible with the Lucene index file format, so you can still use original Lucene tools such as Luke (a Java tool to inspect Lucene's index files).

Links to ports of Lucene can best be found on its Wikipedia page: http://en.wikipedia.org/wiki/Lucene

Lucene?

Posted Mar 23, 2007 23:06 UTC (Fri) by dang (subscriber, #310) [Link]

Actually it makes sense because Lucene is a fantastic tool. There are also some very nice tools built on top of lucene ( see the related projects list at http://lucene.apache.org/java/docs/ ). I've used and managed larger more expensive tools like Thunderstone's Texis, but the flexibility and sophistication of Lucene just blow me away.

Lucene?

Posted Mar 25, 2007 20:53 UTC (Sun) by burki99 (subscriber, #17149) [Link]

The question is not Lucene or PHP, since Zend-Search (http://framework.zend.com/manual/en/zend.search.html), a port of Lucene to pure PHP, is now quite stable.

Lucene?

Posted Jul 15, 2008 5:10 UTC (Tue) by barrygould (guest, #4774) [Link]

SPHINX is often faster than Lucene, written in C, and has several interfaces including SQL
http://www.sphinxsearch.com/about.html

Barry

PHP Search Engine Showdown (O'ReillyNet)

Posted Mar 23, 2007 17:06 UTC (Fri) by rfunk (subscriber, #4054) [Link]

I usually use mnogosearch, though it has little attachment to PHP. I prefer the Perl
query interface.

PHP Search Engine Slowdown

Posted Mar 26, 2007 11:35 UTC (Mon) by eru (subscriber, #2753) [Link]

Yes, that is how my eyes first mis-read the header :-)

Maybe because of a strange experience I have had with the "private" search facilities of most web sites: they almost always runs much slower than a Google search of the entire Internet! Sad to say, this is also true of our belowed lwn.net. So unless you have secrets or registered-users-only content, it seems to be best to let Google be your search engine....

PHP Search Engine Slowdown

Posted Mar 26, 2007 20:45 UTC (Mon) by khim (subscriber, #9252) [Link]

Maybe because of a strange experience I have had with the "private" search facilities of most web sites: they almost always runs much slower than a Google search of the entire Internet!

That's not strange at all, actually. Any "private" search facility must share resources with the web-site itself! Google (and other big search engines) keep most of the Internet in RAM (sort of) - and RAM is faster then disk (where "private" search ends up in the end). So indeed if you have no secret content it's just easier to embed Google.

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds