|
|
Subscribe / Log in / New account

Ranking the Web With Radical Transparency (Linux.com)

Linux.com interviews Sylvain Zimmer, founder of the Common Search project, which is an effort to create an open web search engine. "Being transparent means that you can actually understand why our top search result came first, and why the second had a lower ranking. This is why people will be able to trust us and be sure we aren't manipulating results. However for this to work, it needs to apply not only to the results themselves but to the whole organization. This is what we mean by 'radical transparency.' Being a nonprofit doesn't automatically clear us of any ulterior motives, we need to go much further. As a community, we will be able to work on the ranking algorithm collaboratively and in the open, because the code is open source and the data is publicly available. We think that this means the trust in the fairness of the results will actually grow with the size of the community."

to post comments

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 21, 2016 1:57 UTC (Fri) by JoeBuck (subscriber, #2330) [Link] (9 responses)

The question that was not asked was: how do you keep people from gaming the results, by exploiting the known algorithm to get their bogus pages ranked first? The approach that Google used was to repeatedly change their algorithm, but the problem with doing this completely openly will be that it is transparent which "bad" actors are being punished.

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 21, 2016 2:43 UTC (Fri) by mlinksva (guest, #38268) [Link] (2 responses)

https://about.commonsearch.org/faq says:

> How can you fight spam if your ranking algorithm is open source?
>
> We firmly believe that a large community collaborating on spam detection will stay one step ahead of the game.
>
> It is important that the factors taken into account in the ranking algorithm are aligned with good practices for building user-friendly websites so that optimizing for the algorithm means building a better website for users.

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 21, 2016 10:19 UTC (Fri) by ledow (guest, #11753) [Link] (1 responses)

When even the old human-curated sites of old have gone the way of the do-do because they two were scammed and abused to oblivion, I don't hold out much hope.

Even if you give people a way to rank how accurate your search results were and mark spammy results, you can guarantee that that process itself will be used to knock down competitors and big up supporters.

Trying to run a listing in an automated fashion with well-known algorithms, without significant human moderation sounds like an accident waiting to happen (how long before certain adult sites appear at the top of innocent searches?). And significant human moderation from unknown volunteers just leads the same kinds of problems as Wikipedia.

If the Wiki and an army of bots can't keep order over there with all their funding, there's no chance.

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 21, 2016 12:03 UTC (Fri) by amarao (guest, #87073) [Link]

I we would have opensource search engine with problems of Wikipedia, I think it would be a huge achievement and amazing result. I don't know if Wikipedia have any spam or not, but it gives amazing and relevant answers to adequate search.

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 21, 2016 22:01 UTC (Fri) by zvyagintsev (guest, #84286) [Link] (5 responses)

On the site dedicated to free software and open standards, let's discuss whether going open is the right thing ;)

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 22, 2016 11:09 UTC (Sat) by Wol (subscriber, #4433) [Link] (4 responses)

I think it noticeable that many successful Open Source projects are most definitely NOT democracies. We have Guido Rosso the BDFL, we have Linus (who I commonly refer to as a tyrant - not the modern meaning of the word, but the original "elected dictator" meaning). There's plenty more - people who *listen*, but are prepared to turn round and say "sorry, not here", and who for the most part are *trusted*.

If this is run with a tight controlling group, who say "this is the way it is" but are prepared for all the code and decisions to be open (as in full view, not as in anybody can join in), then I think it could well be a goer.

I wish it well.

Cheers,
Wol

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 22, 2016 17:45 UTC (Sat) by geek (guest, #45074) [Link] (3 responses)

"If this is run with a tight controlling group, who say "this is the way it is" but are prepared for all the code and decisions to be open (as in full view, not as in anybody can join in), then I think it could well be a goer."

indeed! What could go wrnog?

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 22, 2016 23:16 UTC (Sat) by Wol (subscriber, #4433) [Link] (2 responses)

What could go wrogn? Lots.

But if it's all Open Source, then if stuff does go wrong, just fork it ... :-)

If there's NOT a BDFL, as others have pointed out, there's an awful lot else that could go wrong instead ...

Cheers,
Wol

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 22, 2016 23:49 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> What could go wrogn?
LOL.

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 23, 2016 7:49 UTC (Sun) by BlueLightning (subscriber, #38978) [Link]

I think this is an example of "anything that can go wrogn, will go wrogn."

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 21, 2016 11:53 UTC (Fri) by oldtomas (guest, #72579) [Link] (10 responses)

Kudos for fighting the good fight. I wholeheartedly agree with TFA:

"it is critical for the Internet and ultimately for our society to have a healthy diversity in its sources of information".

But this is preposterous:

"the only search engines currently available are for-profit entities, so the Common Search project is creating a nonprofit engine that is open, transparent, and independent".

Never heard of YaCy [1]? Of Searx[2]? (and I'm sure there are more).

I know, "preposterous" is a bit strong: I'm not implying malice, but I see that antipattern which befalls free software as of late: "I'm the only kid on the block, and yes, I keep my eyes firmly closed so I can't see the others!"

[1] https://en.wikipedia.org/wiki/YaCy
[2] https://searx.laquadrature.net/about
(yeah, I know: both are very different beasts: the one is a distributed search engine, the other a distributed meta-search-engine.

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 23, 2016 7:54 UTC (Sun) by NightMonkey (subscriber, #23051) [Link] (9 responses)

Sadly that antipattern is infesting much of web and app-focused development. NIH writ large. "What's the use of code reuse when I have a GitHub profile and resume to build?"

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 24, 2016 7:44 UTC (Mon) by oldtomas (guest, #72579) [Link] (8 responses)

Well, I wasn't so much complaining about implementation. On the contrary: I think a re-implementation can be a fresh and inspiring re-interpretation of an old theme. As when a mathematician finds a new, simpler proof for a known theorem.

What miffes me is this (sometimes I suspect semi-deliberate) lack of culture "oh, such-and-such have already tried something similar", perhaps with a tip-of-the-hat. Perhaps even with a sentence or two about the similarities and differences. As if people were afraid to even look at potential "competitors". Woah. They might be out to eat my lunch.

You know? simple and pure academic honesty.

I just installed python-doc and grepped the 30+ megabytes for occam or Miranda[1]. No hits.

[1] two languages which introduced indentation syntax 8 resp 6 years before Python. I don't even know whether those were first.

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 24, 2016 20:13 UTC (Mon) by Wol (subscriber, #4433) [Link] (3 responses)

> [1] two languages which introduced indentation syntax 8 resp 6 years before Python. I don't even know whether those were first.

What about FORTRAN IV?

Okay, very much not indentation syntax in the way you're thinking, but still very much the column had meaning. Can't remember details, but columns 1-5 are the label, 6 was continuation? 7-72 are code and 73-80 are human comment.

If you didn't indent your code it wouldn't compile :-)

Cheers,
Wol

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 25, 2016 7:31 UTC (Tue) by oldtomas (guest, #72579) [Link] (1 responses)

Hah. Was my first language (yes, I'm *that* old :-)

OK, FORTRAN syntax was weird. Especially its total ignorance of white space, even within identifiers (IEVENTCOUNTER" being the same as I EVENT COUNTER 2 as I E V E ... you get the idea).

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 25, 2016 12:24 UTC (Tue) by Wol (subscriber, #4433) [Link]

Mine too. Although Fortran 77 had been out for a couple of years, but the computer came with a free FORTRAN compiler, Fortran was a pay-for extra. It also came with SPL and PLP (variants of PL/1) - any language they used to build the OS itself was a freebie, the others cost.

I don't think I actually tampered with the OS itself, but I do remember reporting a bug in the printing subsystem and getting it fixed - iirc if you wrote a line more than 256 characters long, the print system shoved in a line-feed. Oh the joys of debugging mini-computer word-processing on new hardware like daisywheel printers. That bug messed up printing in bold (it used carriage returns and overstrikes), and another bug kept on breaking daisywheels - at a couple of pounds a time back then, that was an *expensive* bug!

And I wrote a utility that downloaded soft-fonts to the early laser printers ... those were the days ...

Cheers,
Wol

Ranking the Web With Radical Transparency (Linux.com)

Posted Oct 25, 2016 7:41 UTC (Tue) by jem (subscriber, #24231) [Link]

And column numbers > 80 didn't exist, because punch cards only have 80 columns.

Ranking the Web With Radical Transparency (Linux.com)

Posted Nov 3, 2016 16:20 UTC (Thu) by azz (subscriber, #371) [Link] (3 responses)

Python's indentation-based syntax came from ABC, which Guido van Rossum worked on beforehand; "Designing a Beginner's Programming Language" from 1975 describes the syntax, and says it was inspired by how the Algol 68 report formatted its examples: http://oai.cwi.nl/oai/asset/9452/9452A.pdf

Miranda's indentation-based syntax -- which inspired Haskell's -- came from David Turner's earlier language SASL via KRC; Turner's "Some History of Functional Programming Languages" says that the "offside rule" was introduced in 1976.

occam (which I spent a long time working on myself!) dates from the early 80s -- the occam 1 manual is 1984 -- so it postdates both of these threads of development. I suspect it was inspired by how Tony Hoare formatted his early CSP examples.

Ranking the Web With Radical Transparency (Linux.com)

Posted Nov 4, 2016 11:27 UTC (Fri) by azz (subscriber, #371) [Link] (2 responses)

... and -- having read "Some History" more carefully -- SASL in turn got the offside rule from the hypothetical ISWIM language family described in P. J. Landin's "The Next 700 Programming Languages" from 1966 (http://www.inf.ed.ac.uk/teaching/courses/epl/Landin66.pdf).

Indentation syntax

Posted Nov 8, 2016 9:05 UTC (Tue) by oldtomas (guest, #72579) [Link] (1 responses)

Thanks for this tour of history!

Indentation syntax

Posted Nov 17, 2016 20:36 UTC (Thu) by mp (subscriber, #5615) [Link]

Thanks indeed.
And here's a link to "Some History": https://www.cs.kent.ac.uk/people/staff/dat/tfp12/tfp12.pdf


Copyright © 2016, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds