Great!

Posted May 18, 2015 21:16 UTC (Mon) by ibukanov (subscriber, #3942)
In reply to: Great! by tterribe
Parent article: Rust 1.0 released

> it's just now harder to find

This is a tooling issue. Implementation can generate a stacktrace when it generates NaN the first time.

> The behavior differs precisely because and only when the comparison follows the NaN model,

NaN model in this case does not add an extra branch. If one has a test coverage for both branches for normal code, that test coverage covers NaN case as well.

> Think about things like convergence tests, etc., that could lead to infinite loops.

I can trivially kill a media application that stuck in ∞ loop. Compare that with a bug that leads to arbitrary code execution coming from a randomly downloaded media file. These are vastly different outcomes in consequences. Similarly, compare the same buggy C code that corrupts memory compiled as asm.js and run in a browser (effectively forcing something like NaN model for C pointers) and run as a native desktop application. I personally would vastly prefer to experience the bug in its former incarnation rather than latter.

In general I do not claim that NaN model leads to less bugs. Rather the claim is that the total cost of consequences of those bugs is lower.

Great!

Posted May 19, 2015 14:38 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (4 responses)

> If one has a test coverage for both branches for normal code, that test coverage covers NaN case as well.

I think the real point here is that branch coverage is _insufficient_. The tests should exercise the boundaries of all of the "equivalence classes", groups of inputs which are expected to cause similar behavior in the code. That includes covering all the branches, but in this case a NaN input—or an input which can cause NaN in an intermediate calculation—would also be a separate equivalence class, even if the same branches are used (because the real branches are hidden in the ALU hardware).

NaN inputs, or ranges of inputs which result in NaN, represent discontinuities in the program, and discontinuities need to be tested. Branches are merely a special case of this rule, discontinuities in a piecewise-defined function which determines the control flow.

Great!

Posted May 19, 2015 16:40 UTC (Tue) by ibukanov (subscriber, #3942) [Link] (3 responses)

> The tests should exercise the boundaries of all of the "equivalence classes"

Ideally this should be expressed by types. For example, consider the following fragment which should report an error if z becomes NaN, not only when it is negative:

double x, y, z;
x = f1();
y = f2();
z = x * y;
if (z < 0) return error;

With better typesystem the relational operators could only be applied to non-NaN doubles requiring one to write something like:

double x, y, z;
x = f1();
y = f2();
z = x * y;
if (isNaN(t) || t.asDefinedNumber() < 0) return error;

Now, this looks like a contradiction to my assertion that NaN model reduces the number of branches and the number of tests, but consider if NaN would not be supported. Then the code becomes using a common C practice to return a false value to indicate an error:

double x, y, z;
if (!f1(&x)) return error;
if (!f2(&y)) return error;
if (!mult(x, y, &z)) return error;
if (z < 0) return error;

Notice that now there 5 branches rather than 3 with NaN. This reduction comes from the multiplication having well-defined semantic for NaN values. In turn this directly translates into simpler test coverage as to test the code path leading to error with NaN values it is sufficient to arrange for f1 to return NaN, rather than making f1, f2 and mult to return false.

Great!

Posted May 19, 2015 19:06 UTC (Tue) by nybble41 (subscriber, #55106) [Link]

> ... but consider if NaN would not be supported.

I don't think anyone has really suggested removing support for NaN—sum types of this kind are indeed very useful for error handling and other tasks—but merely that the NaN cases need to be tested separately from the non-NaN cases.

> ... In turn this directly translates into simpler test coverage as to test the code path leading to error with NaN values it is sufficient to arrange for f1 to return NaN, rather than making f1, f2 and mult to return false.

Treating f1() and f2() as the inputs, I would define six equivalence classes based on the _product_ of the inputs, z, and how z is used in the condition: -Inf, <0, 0, >0, +Inf, NaN. After all, the point of the exercise is not to test the machine's multiplication algorithm, and the behavior of the code depends only on the product and not the individual inputs. Of course, this presumes white-box testing; unless the specifications are unusually detailed, a black-box tester wouldn't be able to assume the use of the multiplication primitive and would therefore need more test cases to cover different combinations of inputs.

Full branch coverage would only require two tests, but that isn't enough to show that the condition is implemented correctly for +/-Inf, NaN, or zero, each of which could easily exhibit incorrect behavior without suggesting deliberate malice on the part of the implementer.

Great!

Posted May 20, 2015 11:58 UTC (Wed) by paulj (subscriber, #341) [Link] (1 responses)

Ideally this should be expressed by types.

E.g., Monads? Isn't this what they were invented for?

Great!

Posted May 20, 2015 13:06 UTC (Wed) by ibukanov (subscriber, #3942) [Link]

> E.g., Monads?

Yes, ideally Monads as implemented if not as in Koka [1] but at least as in PureScript [2]. Haskell typesystem is not powerful enough to express many useful idioms that are required by a system language or language where one need to interface a lot with code in other languages. But even just supporting kind-2 types and some syntax sugar should help Rust a lot.

[1] - http://research.microsoft.com/en-us/projects/koka/
[2] - http://www.purescript.org/