LWN.net Logo

Google crawls into source-code search (ZDNet)

ZDNet looks at the new Google Code Search site. "Google is taking its search expertise to one of its favorite audiences: software developers. The company on Thursday launched a Web site, Google Code Search, which the company says will let programmers search billions of lines of code for tips on how to write their own software. The service, conceived by the Google Labs early technology group, will crawl publicly available code, most of which is made available through open-source projects. The search and indexing covers code on Web pages and code that resides in compressed files, said Tom Stocky, a product manager at Google."
(Log in to post comments)

Google crawls into source-code search (ZDNet)

Posted Oct 5, 2006 18:31 UTC (Thu) by sjj (guest, #2020) [Link]

Will proprietary software houses ban developers from using Google now?

Google crawls into source-code search (ZDNet)

Posted Oct 5, 2006 20:41 UTC (Thu) by TwoTimeGrime (guest, #11688) [Link]

Why would they do that? This makes me wonder. I know that the GPL grants extra rights and if you don't accept it then you are bound by copyright law. But copyright law has the fair use clause which I was reading the other day. If you use a 20 line function from a 20,000 lines of code GPL program, I wonder if you could use it under the fair use clause.

I am not a lawyer though.

Google crawls into source-code search (ZDNet)

Posted Oct 5, 2006 22:45 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]

A complete, working function? Highly doubtful, and not worth your company's lawyers' time to save you from writing 20 lines.

Google crawls into source-code search (ZDNet)

Posted Oct 6, 2006 7:39 UTC (Fri) by khim (subscriber, #9252) [Link]

It's kind of useless to borrow trivial code and if you'll borrow non-trivial code then it's not a "fair use" in any case - 20 lines or 200 lines...

Google crawls into source-code search (ZDNet)

Posted Oct 6, 2006 14:25 UTC (Fri) by tjc (subscriber, #137) [Link]

There have been enough GPL violation law suites by now that I think most people realize that they would have to modify the code in some way before redistributing it.

Google crawls into source-code search (ZDNet)

Posted Oct 5, 2006 19:42 UTC (Thu) by alexl (subscriber, #19068) [Link]

Hmmm, searching for "page rank" doesn't return anything interesting :)

Google crawls into source-code search (ZDNet)

Posted Oct 5, 2006 19:47 UTC (Thu) by riteshsarraf (subscriber, #11138) [Link]

This would help in aggressive increment in the number of "GODs".

GOD = Google Oriented Developer

:-)
Almost all the services companies in India will definitely ban/block the
site.

I feel like I'm 7 ...

Posted Oct 5, 2006 21:38 UTC (Thu) by hummassa (subscriber, #307) [Link]

I just googled for the F-word ... result 1-10 of about 32,000 (0.11
seconds)
so, I googled, then I giggled :-)

I feel like I'm 7 ...

Posted Oct 5, 2006 22:01 UTC (Thu) by pjdc (guest, #6906) [Link]

That's odd, when I searched for "fixme" I got 620,000 hits.

Try "foo"

Posted Oct 5, 2006 22:42 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]

1.48 million.

Try "foo"

Posted Oct 5, 2006 23:13 UTC (Thu) by Mithrandir (subscriber, #3031) [Link]

"int i"

Results 1 - 10 of about 1,990,000. (0.11 seconds)

Try "foo"

Posted Oct 5, 2006 23:27 UTC (Thu) by dilinger (subscriber, #2867) [Link]

"#define true"

Results 1 - 10 of about 41,100. (0.05 seconds)

Welcome to the future!

"goto"

Posted Oct 6, 2006 0:26 UTC (Fri) by yokem_55 (guest, #10498) [Link]

Results 1 - 10 of about 953,000. (0.09 seconds)

"this shouldn't work"

Posted Oct 6, 2006 16:46 UTC (Fri) by AJWM (subscriber, #15888) [Link]

Results 1 - 10 of 13. (0.03 seconds)

Actually I was kind of expecting more ;-)

"this should never happen"

Posted Oct 6, 2006 22:28 UTC (Fri) by BlueLightning (subscriber, #38978) [Link]

"this should never happen" - 35,800

"just in case" - 152,000

Unix heritage

Posted Oct 7, 2006 9:38 UTC (Sat) by glettieri (subscriber, #15705) [Link]

"you are not expected to understand this" - 295.000

incorrect license labelling?

Posted Oct 6, 2006 2:10 UTC (Fri) by stevenj (subscriber, #421) [Link]

I noticed with amusement that one of my searches turned up my own code, from my own tarball, but incorrectly labelled as LGPL instead of GPL. (This despite the fact that clicking the source code link turned up the file with the standard boilerplate GPL statement at the top.)

I wonder what algorithm they are using to identify the license? Clearly it has some bugs still.

incorrect license labelling?

Posted Oct 6, 2006 9:42 UTC (Fri) by etienne_lorrain@yahoo.fr (subscriber, #38022) [Link]

You are lucky - mine appear as "Unknown License - C".
Is it so difficult to detect file "gpl.txt" at top of the tar.gz, or the standard GPL header of the C-file, or is it a more-or-less volumtary misfeature - to not disturb too much the costumer?

incorrect license labelling?

Posted Oct 12, 2006 14:37 UTC (Thu) by mmarsh (subscriber, #17029) [Link]

It seems to get the license right for my code, but thinks a .h file containing

template< unsigned int NumT , unsigned int ThreshT >
class Combinatoric : public CODEX_ASN1::Base

is C.

regexps!

Posted Oct 6, 2006 15:35 UTC (Fri) by wjhenney (guest, #11768) [Link]

The most interesting part to me is that it supports regular expressions. Now if only they would do that for the Google web searches.

regexps!

Posted Oct 12, 2006 2:43 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

Would .* with an S modifier (matching \n) trigger a meltdown? ;)

Great for finding vulnerabilities?

Posted Oct 9, 2006 13:20 UTC (Mon) by Cato (subscriber, #7643) [Link]

Presumably this is good for finding simple security holes in open source software, across a wide range of programs - e.g. use of unsafe system() in Perl programs, or format string errors in C...

Great for finding vulnerabilities?

Posted Oct 16, 2006 22:07 UTC (Mon) by roelofs (subscriber, #2599) [Link]

Presumably this is good for finding simple security holes in open source software, across a wide range of programs - e.g. use of unsafe system() in Perl programs, or format string errors in C...

...backdoor passwords, possible buffer overruns (flagged with FIXMEs or XXXs), etc. Oh my, yes. Check out the links on this InfoWorld blog posting:

Code Search joins hackers' toolbelt

(where by "hackers" they mean "crackers").

Greg

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds