LWN.net Logo

Testing

Testing

Posted May 21, 2008 8:34 UTC (Wed) by bvdm (guest, #42755)
In reply to: Testing by eru
Parent article: Open Source Security Report

There are existing statistical test suites that check for randomness and this is a well
explored research area. Hardware (true) RNG generators typically include such tests to guard
against physical failure. 

Given the near ubiquitous of OpenSSL I consider it a shame that such tests are not included in
the OpenSSL test suite. The failure mode in the Debian case was so bad that I would venture
that even a simple statistical test should have picked it up. Cost due to speed of execution
of the test cannot compare to the cost of failure for something this important!

Anyone interested in the details should consult NIST Special Publication 800-22 "A Statistical
Test Suite for Random and Pseudorandom Number Generators".


(Log in to post comments)

Testing

Posted May 21, 2008 9:23 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

The Debian output should (I haven't checked) pass this test, indeed OpenSSL may well include
such a statistical test and it wouldn't have changed a thing.

Unless NIST specifies a protocol whereby a large number of such RNGs are tested and the
results compared to ensure they are different (which can't be used in a unit test since only
one sample is available), it doesn't detect the Debian bug because that bug only removes the
entropy from a PRNG seed, and the PRNG output is still pseudo-random, thus passing the test.

Statistical tests don't, and indeed can't prove that something is random in the sense needed
for most cryptography. They can only prove that the data isn't correlated by the known types
of relation handled by the analysis. A program which always outputs...

0, 4, 8, 8, 1, 5, 0, 7

is exactly as "random" as the 10-sided die I just used to actually generate that sequence, so
far as a statistical analysis is concerned. Yet for cryptography rolling dice is acceptable
(if you do it right) while a program which always outputs that sequence is useless.

Testing

Posted May 21, 2008 12:56 UTC (Wed) by bvdm (guest, #42755) [Link]

Okay, I was not clear enough.

The function of a PRNG is of course to take a small number of bytes of high entropy and turn
them into a much longer stream of bytes that has much lower entropy per byte. This is
necessary because true high entropy is scarce. So a PRNG solves a "supply and demand" problem.
And of course Linux has both /dev/random and /dev/urandom which allows a choice of how much
entropy you are guaranteed to get.

What I meant was that OpenSSL should have tests where it statistically verifies the quality of
its *entropy input*, not the output.

This is what hardware RNG's do, they run statistical tests on their entropy input to verify
that physical failure of the high entropy phenomenon that is being measured (radioactivity
etc.) does not destroy the device's security claims. Hardware devices need to do this because
the physical entropy source is typically a single point of failure whereas software PRNG's
rely on multiple sources (keyboard strokes, network packet arrival times etc.)

The Debian OpenSSL bug went undetected because OpenSSL apparently  has no test of the entropy
input similar to what hardware RNG's have. I sure that the upstream team carefully verified
the mechanism, but given the subtleness by which the bug was introduced, it warrants extra
precautions for the future.


Input was perfect!

Posted May 21, 2008 13:42 UTC (Wed) by khim (subscriber, #9252) [Link]

There are nothing subtle there. OpenSSL used very good source of high entropy: /dev/random. Also there was good PRNG to produce a lot of lower quality entropy. The thing that was at fault was tiny procedure responsible to transfer high entropy to the PRNG pool. In the end it just ignored good source of entropy but shook the pool. So verification of input will be useless: input was not at fault. And verification of output will be hard (as discussed above).

Input was perfect!

Posted May 22, 2008 3:07 UTC (Thu) by bvdm (guest, #42755) [Link]

You are being disingenuous. Tests can be added at any or multiple levels. And the bug was
subtle, just reading the actual code (as in a previous LWN article) does not raise any
immediate suspicions.

Testing

Posted May 21, 2008 12:58 UTC (Wed) by bangert (subscriber, #28342) [Link]

writing a test that ensures a PRNG is indeed a PRNG is dificult.

the debian openssl bug however, could have been found by a test that 
ensures that the PRNG generates more than 16bits of random data. i suppose 
checking for 28 or 29 bits should still be pretty fast. 16 bits is abysmal 
and should be caught immediately.

also the slowness factor is really not a problem for a binary 
distribution. i wouldnt mind if openssl had an extended testsuite which 
ran for a couple of days...

Testing

Posted May 21, 2008 13:13 UTC (Wed) by bvdm (guest, #42755) [Link]

The NIST document describes the range of available statistical tests for PRNG's well enough,
but that's not what I am suggesting.

The OpenSSL bug was made possible because OpenSSL has its own layer of entropy processing on
top of sources such as /dev/random on Linux. This is because OpenSSL needs to support
platforms where /dev/random is not available. So even in the presence of a the high quality
entropy source /dev/random on Debian, a bug in OpenSSL negated that entropy. So I am arguing
that OpenSSL needs to have a test suite on top of their entropy stack to detect future bugs.

That's not what these tests do

Posted May 21, 2008 14:07 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

Here are two data sets, one has 8-bits of true random data input, the other has 16-bits of
true random data input, but like the Debian OpenSSL implementation in each case the actual
data is PRNG output, just one has a smaller range of possible seed values. Simply reply
showing how you algorithmically determined which was which with the generous list of values I
provide...

(Or since that's practically impossible for the reasons I already explained, go away and learn
about PRNGs and seeding and make sure you don't touch any crypto software)

set 1: 9, 89, 64, 12, 49, 19, 72, 28, 34, 6, 9, 31, 65, 73, 17, 28

set 2: 9, 50, 30, 22, 36, 89, 29, 7, 82, 65, 77, 30, 42, 54, 83, 77

I am getting bored of having to roll dice to prove my point here. Please, further
contributions only from people who've both looked at the code in question and already know a
little about the subject.

The tests don't need to do that

Posted May 21, 2008 21:34 UTC (Wed) by man_ls (subscriber, #15091) [Link]

There is a possible mechanism to check not the input, but the output of ssh-keygen. The test could only generate a certain number of keys and check that none of them are repeated. With the Debian hole you would have to generate at most 2^16 keys to find a collision. Given that it took a few hours to generate all the compromised keys, we might expect that the whole process shouldn't last much more than that. Of course the generation might be optimized. In fact on my machine ssh-keygen for RSA with the default key length takes less than 2 seconds; so creating 65536 keys might be done in about 36 hours.

But the process might even be quite shorter. Given that a birthday attack is possible, only about the square root of the number of possible keys is needed. Even less if the keys are not evenly distributed, which was the case. So in this case less than 8 minutes of random generation of keys might have found the security hole, on my hardware. It should be trivial to check this approach in practice, wouldn't it?

A long test ahead

Posted May 21, 2008 23:58 UTC (Wed) by man_ls (subscriber, #15091) [Link]

Uh... I should have read to the end of the comments first, sorry for that. I see that below you mention the birthday attack, and then discard it because the only entropy comes from the process ID which will not repeat itself until it rolls around. So the PRNG is fed with sequential data, and is not random enough even to create a birthday collision.

The time to generate the compromised keys is just a couple of hours -- on a 31-processor cluster. It makes sense that on my lowly AMD64@2.0 GHz it takes about 36 hours. I just checked that generating a few thousand keys does not provide a collision. I will let it run for a while until PID wraps, and then check if there are indeed collisions. But you are right that it is hardly a unit test that which might take longer than a day.

A long test ahead

Posted May 22, 2008 6:08 UTC (Thu) by man_ls (subscriber, #15091) [Link]

After running overnight, my little script is already generating duplicate keys as expected. (PID wrapped around some hours ago.) This with libssl0.9.8 version 0.9.8g-1.

One wonders if such a verification would really be useful for a distro, or if it is indeed "fighting yesterday's battle".

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds