LWN.net Logo

The code monkey's guide to cryptographic hashes for content-based addressing (LinuxWorld)

Val Henson takes a look at content-based addressing. "Used properly, content-based addressing dramatically reduces bandwidth use, simplifies code, and improves security. But used when it doesn't make sense, it can reduce performance, increase bandwidth usage, and create new security risks. Figuring out when content-based addressing is good and when it is very, very bad requires an understanding of both systems programming and mathematics. What's a programmer to do? In this article, we'll lay out the considerations in language that any programmer (and even some managers) can understand."
(Log in to post comments)

The code monkey's guide to cryptographic hashes for content-based addressing (LinuxWorld)

Posted Nov 16, 2007 20:06 UTC (Fri) by nix (subscriber, #2304) [Link]

Well, LinuxWorld did an excellent job of rendering that typically superb 
article as hard to read as possible. Blinking Flash ads, blinking gifs, an 
article sliced into pages approximately two paragraphs long... I gave up 
before the end because I just can't read something at a rate of two 
paragraphs per minute, and I don't have the kind of bandwidth here that 
would be needed to read it any faster.

(I mean, there were more Flash ads than *paragraphs*, FFS. I'd be glad if 
Val could post this in a form that's actually readable, because LinuxWorld 
are very close to going into the /etc/hosts 127.0.0.1 bucket here. There's 
no point even trying to read anything posted there anymore.)

The code monkey's guide to cryptographic hashes for content-based addressing (LinuxWorld)

Posted Nov 16, 2007 20:44 UTC (Fri) by basic (guest, #12705) [Link]

heh, try this link

The code monkey's guide to cryptographic hashes for content-based addressing (LinuxWorld)

Posted Nov 16, 2007 21:09 UTC (Fri) by vmole (guest, #111) [Link]

Mr. Basic, you're my hero.

The code monkey's guide to cryptographic hashes for content-based

Posted Nov 16, 2007 21:13 UTC (Fri) by pflugstad (subscriber, #224) [Link]

The code monkey's guide to cryptographic hashes for content-based

Posted Nov 16, 2007 21:20 UTC (Fri) by vmole (guest, #111) [Link]

How does adblock or flashblock convert the N-page article to one single page?

The code monkey's guide to cryptographic hashes for content-based

Posted Nov 17, 2007 0:54 UTC (Sat) by beoba (guest, #16942) [Link]

A feature request if I ever heard one!

The code monkey's guide to cryptographic hashes for content-based

Posted Nov 17, 2007 4:23 UTC (Sat) by pflugstad (subscriber, #224) [Link]

It doesn't,  but it deals with the blinking flash ads and blinking gifs the OP was complaining
about.  Although the link basic posted was clearly the best solution, if you can't have that,
then getting rid of the blinking crap is the next best thing.

I'm still astonished every time I surf the web without flash/ad block.  I honestly don't know
how people can stand it.

The code monkey's guide to cryptographic hashes for content-based

Posted Nov 17, 2007 11:30 UTC (Sat) by nix (subscriber, #2304) [Link]

Oh, agreed, although personally on my home systems I use Polipo and don't 
have flash installed. However I was reading that article at work (while 
waiting for some interminable test runs to finish) where we have no 
control over the browser configuration. Flashing ads are omnipresent :/

The code monkey's guide to cryptographic hashes for content-based addressing (LinuxWorld)

Posted Nov 16, 2007 21:56 UTC (Fri) by nix (subscriber, #2304) [Link]

Well, *that* was easy to spot from the page layout (not). :/

Thank you!

Nobody complains about the format.

Posted Nov 16, 2007 21:51 UTC (Fri) by dmarti (subscriber, #11625) [Link]

Please complain. Send me mail at dmarti@linuxworld.com, start a petition, put something in a blog, whatever. Flame on, people, please.

As far as Management knows, the LinuxWorld.com readers all use MSIE and don't mind the multi-page format.

The only reader complaints I have about the site format are my own, so I look like an ass waving LWN or Red Hat Magazine layouts around and making what look like completely unsubstantiated claims about what the Linux market wants.

Nobody complains about the format.

Posted Nov 16, 2007 22:10 UTC (Fri) by nix (subscriber, #2304) [Link]

WTF? Management seriously thinks that a Linux site's users are likely to 
be all IE because of the lack of response to an annoying in-your-face 
popup? (It doesn't matter what's on the popup: if it pops up on a browser 
that doesn't block such things I close it instantly without reading, and 
generally navigate away from the whole site as well, especially if, as 
some sites do, it throws the same damn popup at me whenever I navigate to 
any new page, even internally; WashPo, I'm looking at you.)

Multi-page formats are OK, as long as the page breaks are at reasonable 
separations: enormous pages make scrolling slow in some browsers and make 
initial page loading take too long. But one page every *two paragraphs* is 
madness. It pushes loading times up again, I can't search the result, I 
can't effectively flip back to see what was written a few paragraphs 
earlier: it's like listening to a lecture instead of reading, except that 
the Flash ads make it like listening to a lecture with someone screaming 
adverts in your ear all the time.

(My antipathy to moving or brightly coloured graphical ads is somewhat 
extreme because I've got amblyopia, so a brightly coloured or flashing 
lump off to one side of my visual field will tend to overwhelm the 
*interesting* stuff which I'm actually looking at with the other eye 
unless I close one eye completely, and reading with one eye shut is 
somewhat annoying. In my case it's the left side that's bad, but about 
half of amblyopics have a dominant left eye so would be similarly affected 
by bright right-side blobs. Amblyopia is not rare.

I wonder what a photosensitive epileptic would do on a site like 
LinuxWorld? Turn all images off, I suppose.)


LWN's pages are pretty much the right length, I'd say. People reading 
articles aren't scared of scrollbars and aren't going to fail to notice 
that they have to scroll down...

Nobody complains about the format.

Posted Nov 17, 2007 6:10 UTC (Sat) by iabervon (subscriber, #722) [Link]

I can't stand to have a Flash plugin installed without Flashblock. Even when I specifically go
to a page to look at something Flash, I want it to only go under my control, because they
distract me from what I'm doing in other windows and sometimes crash my browser (which isn't
too big a deal, since I can restore my session, but then I really want the Flash to not go the
next time). I have "image.animation_mode" set to "once", so that it'll play animations I want
to see (like weather radar), but looping stuff doesn't loop. This makes ads less annoying, but
I care more about user icons and such. It seems like, while the average ad is worse than the
average user icon, the worst user icons I see are far worse than the worst ads; I've been
sorely tempted to ask other people to disable looping on their web browsers when they're
across the room from me.

I also generally click on links and then iconify the window and do something else while it
loads, so most ads are done before I actually view the page at all. The main thing that still
annoys me is that Firefox sometimes pops up new windows (sometimes ads, sometimes not); I want
it to never pop up a new window unless I select it from the menu, and I haven't figured out
how to make that happen.

Nobody complains about the format.

Posted Nov 17, 2007 7:52 UTC (Sat) by dmarti (subscriber, #11625) [Link]

The list of extensions I use includes AdBlock Plus, Flashblock, and CustomizeGoogle.

To set when a popup can be triggered, you might want to check privacy.popups.disable_from_plugins and dom.popup_allowed_events in about:config.

Nobody complains about the format.

Posted Nov 17, 2007 17:18 UTC (Sat) by iabervon (subscriber, #722) [Link]

Thanks! That's a lot closer, although, of course, now that I'm looking at it, I want a bunch
of other things. (1) If the privacy rules block everything in an onclick attribute, ignore it
instead of making clicking do nothing (that is, if it's a link aside from that, use the link);
(2) The "popup blocked" banner ought to have a button for showing the popup which was blocked,
so you can do the Flashblock-style interaction where you tell the browser to show you
something if and only if you want to see that particular thing; (3) Put popups in tabs instead
of new windows, since I sometimes want the content, but never want it in a popup.

I think I'm also missing "popup blocked" banners for pages that do the popup as I leave the
page, but I don't care all that much.

Nobody SHOULD complain about the format.

Posted Nov 17, 2007 0:41 UTC (Sat) by khim (subscriber, #9252) [Link]

There are simple yet sad fact: 99% of users can not say what they actually want. They will vote with their feet if you'll do something with your site (as I suspect a lot of people already did in regard to LinuxWorld), but if you'll conduct survey - you'll never find the right answers.

If you want to see how well visitors like you multy-page approach - you give some users one-page version, other users - multipage version and count number of users who dropped out before last page was reached. If you want to know how many users will close page without reading anything if you add blinking flash advertisement - you add silent "close page" handler and calculate this number. Yes, it's not easy to measure satisfaction, survey is much easier - but that's the only way to get somewhat unbiased result. Any survey will be quite biased just because it includes results from rare people who decided to participate in a survey at all!

Nobody complains about the format.

Posted Nov 17, 2007 1:28 UTC (Sat) by brianomahoney (subscriber, #6206) [Link]

Multi page sucks --- always, and the more it is done the dumber it
looks

--
Brain

Nobody complains about the format.

Posted Nov 17, 2007 11:30 UTC (Sat) by nix (subscriber, #2304) [Link]

You'd prefer LWN's One Big Page format?

One Big Page ? Yes!

Posted Nov 17, 2007 13:03 UTC (Sat) by khim (subscriber, #9252) [Link]

You nailed it. I like one page version. If internet is fast - it does not matter, if it's slow (I sometimes read LWN over GPRS) it's even more important: I can read few top articles while the rest is loading and then I can just scroll - easy and fast. Of course if you have superslow internet (150bps or 300bps) then it's a pain - but if you'll consider the fact that one page of LinuxWorld's article (with few paragraphs of text) is heavier then the whole LWN's weekly "one big page"... it's not even a contest.. Of course it's mostly pictures and flash ads, but even just .html files combined for this one article are more then text of LWN's big page!

Of course there are limits - if you'll combine all "one big page"'s on LWN in one superbig page - it'll be too much. But to split an article ? I'd prefer even "What every programmer should know about memory." as one page - and I'm not alone there...

One Big Page ? Yes!

Posted Nov 19, 2007 13:16 UTC (Mon) by tekNico (guest, #22) [Link]

A little known secret: append this
bigpage?format=printable
to an LWN article URL. You get One Big Page, and get rid of the left column at the same time. Wide, long bliss. ;-)

One Big Page ? Yes!

Posted Nov 19, 2007 21:01 UTC (Mon) by oak (guest, #2786) [Link]

Oooh, what bliss... That really makes a difference when reading these 
pages on N800 (800x480@226DPI resolution), thanks!!!

Nobody complains about the format.

Posted Nov 21, 2007 16:30 UTC (Wed) by stevem (subscriber, #1512) [Link]

Absolutely, every time. The single page can load in the background while I do other stuff,
then when I want to read it later it's all there. I *don't* want to have to wait for
individual page loads later.

I also middle-click on huge numbers of the articles/comments links as I go through the weekly
edition and read more details on each at the end. Again, that means the computer can do the
job of waiting rather than me.

Nobody complains about the format.

Posted Nov 23, 2007 23:12 UTC (Fri) by noise (guest, #2923) [Link]

Even better is to open all those background tabs before you hop on a long flight - plenty of
offline 
reading material!



The code monkey's guide to cryptographic hashes for content-based addressing (LinuxWorld)

Posted Nov 16, 2007 22:31 UTC (Fri) by nix (subscriber, #2304) [Link]

Now I'm done moaning, what an excellent article: clear, concise, and even funny (humour and hashing: two subjects I thought could never be mixed). I'll be showing it to newbies, definitely.

(The lifecycle table was hilarious, and oh so true.)

btw, one fault.

The simplest case is when the data is always different. In that case, computing the hash is an unnecessary overhead and should only be included if there is some compelling security threat that it would solve.
This is true only if you're using the hash as a high-speed comparator to reduce data traffic somehow. Filesystems that use content-based addressing, such as git, certainly don't consider computing hashes of (say) new blobs to be unnecessary overhead! Equally, when the data is the same, computing a hash of the new object and comparing it with a hash of the old one may well be substantially faster than doing a bytewise comparison of both objects, because the hash-and-compare approach dirties half as much cache and pulls half as much data out of main memory (in the case where the objects are identical: obviously if they're different lengths, and the length is known without byte-counting, a hash comparison is truly pointless).

(This is implied further down, and of course Val knows it as she could eat me alive and crunch my bones in the hash function league, but she never actually says it.)

One-page view please!

Posted Nov 17, 2007 11:06 UTC (Sat) by job (guest, #670) [Link]

Please, dear editor, pretty please adopt a policy never to link to multi-page articles when there is a one-page layout available. I want to actually read the articles you link to as I expect there is something interesting for a Linux users such as myself there, and there is nothing more annoying than having to load a new page every other paragraph. (I hope my use of bold impresses you ;) but this is a sincere request.)

Translation from English to English: please make one-page layout unavailable

Posted Nov 17, 2007 14:00 UTC (Sat) by khim (subscriber, #9252) [Link]

Nobody likes multi-page layout. Yet it's still the norm. Why ? Money. Multi-page format allows you to put more advertisements. Thus it's main format for most new sites today. Many already adopted practice to make on-page format either a) hard to find or b) just plain unavailable. If LWN and other disaggregator sites will adopt the policy you've suggested more and more sites will switch from category a) to category b). Do you REALLY want THIS ?

GreaseMonkey scripts will help you - again till they'll be widely deployed...

One-page layout

Posted Nov 17, 2007 18:00 UTC (Sat) by job (guest, #670) [Link]

Not really, but in this case it would have been the lesser of two evils. Links such as this
news story annoys a lot of people, obviously. Many popular web sites such as Slashdot and
Reddit sometimes links to single page layouts already, and I doubt LWN would make a notable
difference advertisement wise. Perhaps linking to print views will send a message to web sites
that annoying users is not a good business model.

Meh, Val Henson addressing (LinuxWorld)

Posted Nov 19, 2007 11:38 UTC (Mon) by Tv (subscriber, #7109) [Link]

I choose not to consider Val Henson an expert on cryptographical hashes and their use. I think
she picked an opinion first, and is looking for facts to back it up -- not the other way
around. And not even always sticking with the facts.

One response to an earlier paper by Val:

http://www.cs.colorado.edu/~jrblack/papers/cbh.html

Meh, Val Hensonaddressing (LinuxWorld)

Posted Nov 19, 2007 12:16 UTC (Mon) by flewellyn (subscriber, #5047) [Link]

I think it's a bit much to take the fact that there is obviously disagreement in the literature between Henson and some other researchers on this subject, and from there decide that Henson is, as you allege, picking an opinion first and then looking for facts to back it up.

The paper you linked to shows one side of a scholarly debate in progress, which is normal for an area of active research; I hardly consider it evidence that Henson's knowledge is lacking, or that her scholarship is less than honest. Do you have anything to substatiate those claims?

Val Henson as a cryptographer

Posted Nov 20, 2007 19:49 UTC (Tue) by man_ls (subscriber, #15091) [Link]

It does not sound like a scholarly debate, but it rather looks like a complete rebuttal. Anyway, it seems that Ms Henson has learned her lesson: she is no longer proposing that content-based addressing with broken hashing functions is dangerous or otherwise broken. Learning from your mistakes is the second best thing (after being always right).

Meh, Val Henson addressing (LinuxWorld)

Posted Nov 22, 2007 20:51 UTC (Thu) by lysse (guest, #3190) [Link]

> I choose not to consider Val Henson an expert on cryptographical hashes and their use.

That's nice for you. Why should I care?

The code monkey's guide to cryptographic hashes for content-based addressing (LinuxWorld)

Posted Nov 22, 2007 4:27 UTC (Thu) by dvdeug (subscriber, #10998) [Link]

As for the article itself, I'm not sure of the reason to use cryptographic hashes here.

The author says "But if you could find a hash function that almost never had collisions, [...]
Fortunately, hash functions that are collision-resistant (simply put, very difficult to find
inputs with same output) have already been developed for another purpose - they are called
cryptographic hash functions." But those are two different things; any decent hash function
given arbitrary input will collide with a probability of 1/(2^n), where n is the number of
bits. Cryptographic hash functions aren't less likely to collide by chance; they're merely
very hard to deliberately make collide. 

With CVS, what do you gain by uploading a chunk of data that deliberately collides with
another chunk of data? Where does cryptographic security help here?

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds