LWN.net Logo

designed to fail

designed to fail

Posted Apr 14, 2005 15:50 UTC (Thu) by vgough (guest, #2781)
In reply to: Where human intuition fails... by pkolloch
Parent article: Call to atention about using hash functions as content indexers (SCM saga)

I think you've misunderstood Pedro's comments. If you read his comment carefully, you will see why he doesn't care about the discussion of probabilities.

The arguments for and against these sorts of approaches are similar to the jokes about mathematicians vs. engineers. This is not a problem with intuition or misunderstanding of probabilities, it about how people view the job of algorithms, and how they should be designed.

I agree with Pedro that, ideally, algorithms should not be designed with known possibilities of failure (even very small possibilities) - which puts me somewhat toward the mathematician side of the spectrum. This is particularly unsettling because you can't predict what inputs would lead to failure. Because the probability of failure is low, people may feel safe in skipping checks to ensure this assumption hasn't been violated.

Adding some checks to make sure nothing bad has happened would probably solve most people's concerns. Sure, someday Linus might have to say "sorry, please add some whitespace to your patch, because it causes a hash collision with one of the old kernel patches", but at least it would be fixable if the problem can be detected.

So that is why Pedro shows no interest in getting into a talk about acceptable probabilities. The entire point is that algorithms should not be designed to fail unless there is a very good reason!


(Log in to post comments)

Living in a world of errors

Posted Apr 14, 2005 17:33 UTC (Thu) by shane (subscriber, #3335) [Link]

Hard disks (well IDE) uses CRC to check for bad data. I have no evidence to support me, but I'm guessing that it uses 16-bit or 32-bit CRC:

http://www.geocities.com/SiliconValley/Pines/8659/crc.htm

Mandelbrot proved that it is impossible to get a clean signal by improving the signal strength while at IBM, and engineers started introducing redundancy into transmissions. In some ways, this led to the necessity for packet-based transmission, and eventually to packet-switched networks, like the Internet:

http://www.fractalwisdom.com/FractalWisdom/fractal.html

In fact, TCP and UDP packets have checksums. Oh, and memory chips use parity to detect errors. Heck, CPU's have internal error detection these days.

Basically all of the technology that exists assumes and deals with failure and risk. When you start looking at failure rates with tens of zeros, engineers shrug their shoulders and move on with life.

Mathematicians pretend like their subject is somehow "pure", but consider that it took thousands of years before Euclid's 5th postulate was fully explored - arguably the most read book on mathematics ever!

Basically, you may not like the possibility of collisions from an aesthetic point of view, but it is not a real world problem. And like my daddy always said, there's no accounting for taste.

ideas are perfect, implementations are not

Posted Apr 14, 2005 18:21 UTC (Thu) by vgough (guest, #2781) [Link]

It is a very interesting topic. I'd say the biggest problem between
mathematicians and engineers is when one group thinks they can get along
without the other! Because in general, mathematicians don't build
bridges, and engineers don't lose much sleep over the limits of our
understanding of the universe :-)

I agree, the problem is mostly one of aesthetics. But we should hope
there will always be people around to ask if our current implementations
are ugly for a reason, or just for the sake of ugliness.


Living in a world of errors

Posted Apr 14, 2005 21:48 UTC (Thu) by swiftone (guest, #17420) [Link]

Hard disks (well IDE) uses CRC to check for bad data.

As the article mentions, _checking_ identity (e.g. checking for bad data) is different than _assigning_ identity. All of th examples you cite are checking validity. None are assigning identity. If you look at identity algorithms in the industy (UUID, etc) you'll find they are a different breed.

Living in a world of errors

Posted Aug 29, 2005 15:40 UTC (Mon) by lunakid (guest, #32146) [Link]

About checking and assigning (creating) identity being different:
Quite correct. The difference between the two is context. Identity checking always assumes/implies the more specific context of e.g. change characteristics. Identity assignment, OTOH, can basically only assume some upper bound of numbers in the supported pool of identities, and little more (some life-cycle information for aging may also be available etc.).

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds