LWN.net Logo

Risk report: Not only numbers count

Risk report: Not only numbers count

Posted Feb 27, 2008 11:09 UTC (Wed) by chel (guest, #11544)
Parent article: Risk report: Three years of Red Hat Enterprise Linux 4

Although it is a good track record and numbers are quite manageable, I don't think it is a
good idea in general to express system security in these kind of manageable numbers. Remember
the Titanic had two flaws: wrong steel and insufficient lifeboats.


(Log in to post comments)

Risk report: Not only numbers count

Posted Feb 27, 2008 13:34 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

Hmm, I'm sure we can come up with a pretty huge number of problems even just relating to that
one incident.

* Improper lifeboat training for crew (and none at all for passengers) meant that most boats
were sent away half-empty. Almost twice as many people could have survived if every boat had
been filled. With this problem even if Titanic had carried "sufficient" boats many would have
drowned.

* Lack of engineering experience with huge ships meant that the rudder was inadequately sized.
Hence Titanic could not turn quickly to avoid the iceberg as other ships would have in the
same circumstances.

* Cost and weight considerations meant Titanic did not have full-height bulkheads. A smaller
ship owned by the same company had remained seaworthy  after losing the entire bow of the
ship, full-height bulkheads prevented water from flooding the remainder of the vessel.

* Lack of communications standards meant that the sailor on watch who saw distress flares go
up from Titanic dismissed them as celebrations.

* Poor navigational fixes meant that Titanic's position was mis-reported, delaying rescue for
those who didn't drown and thus causing further loss of life.

Risk report: Not only numbers count

Posted Feb 27, 2008 14:46 UTC (Wed) by chel (guest, #11544) [Link]

Well, I hope the message is clear. For flaws the number isn't as important as the impact. If
you fix a flaw only minutes after it causes a meltdown of a nuclear plant, it is still too
late.

Many of those manager oriented statistics have limited value in real life. It are numbers, but
calculus can and should not be applied with those numbers. To take an example, for a car,
failing brakes are a real problem, if one of the two light bulbs in a rear light is broken it
is not. You should not add up the two to draw a conclusion. If you do you may end driving a
car with only one flaw: no brakes.

BTW I have no problems with RH product quality. Over 10 years ago I started to use RH products
in mission critical applications. 

Risk report: Not only numbers count

Posted Feb 27, 2008 21:17 UTC (Wed) by proski (subscriber, #104) [Link]

A single flaw would not normally cause anything as bad as a reactor meltdown. Systems where safety and security are paramount are designed in such way that a single flaw doesn't cause a complete failure. For instance, firewall protects against outside access, services require authentication to keep strangers away, there is a strict separation between users and root, the system is monitored, the essential data is stored on a separate system and there are offsite backups.

In the real life, there are things other than complete success and total failure. Even on Titanic, there were survivors.

Security experts are more categoric when it comes to vulnerabilities in individual packages because it's better be safe than sorry. Yet it's entirely justified for a distribution to post general statistic. It's a measure of how diligent the distribution has been at fixing issues in the packages it ships. It can be translated to security of individual systems, but it's not a trivial dependency.

Risk report: Not only numbers count

Posted Feb 27, 2008 22:12 UTC (Wed) by chel (guest, #11544) [Link]

I suggest http://www.securityfocus.com/news/8412 for further reading, especially the part
about the race condition bug that blinded the control system, and played a major role in the
NE Blackout.

For me Open Source design together with bug finding projects on several places is much more
important then fixing time of bugs after they have been published. Not every bug is published
before a disaster. For the NE blackout: "About eight weeks after the blackout, the bug was
unmasked as a particularly subtle incarnation of a common programming error"

The best place and time to find and correct bugs is on your desk before damage is visible. OSS
helps to do that.

My problem with this kind of statistics is that it moves the discussion in the direction of
discussions about OS-es that fix flaws in virus checkers.


Risk report: Not only numbers count

Posted Feb 27, 2008 18:10 UTC (Wed) by tzafrir (subscriber, #11501) [Link]

Have you read the full article before posting that?

I found that article and the previous "Risk reports" an interesting reading.
Even though I normally use a different distro.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds