Great!

Posted May 16, 2015 1:03 UTC (Sat) by wahern (subscriber, #37304)
In reply to: Great! by tetromino
Parent article: Rust 1.0 released

That also means no Rc, Arc, or box, which are idiomatic. I'm also unsure whether there are hidden problems with the dereference operator and coercions.

In any event, if the accepted practice for the Rust community is to simply ignore OOM, then that would suck. It's generally understood to be best practice in the Unix world for _libraries_ to always handle OOM. Applications can choose to abort or recover, but at least they have the choice.

One of the nice things I like working with Lua is that Lua can propagate OOM while maintaining VM consistency. That kind of attention to detail shows a concern for code correctness. Like Perl, Python, or Rust they could have punted, but they didn't. That means _I_ get to make the choice of how to handle OOM, based on the particular constraints and requirements of my project. And it means library authors who bothered to worry about OOM haven't wasted their time.

Great!

Posted May 16, 2015 12:03 UTC (Sat) by ms_43 (subscriber, #99293) [Link] (15 responses)

If you find a library that _actually_ handles OOM properly instead of just adding 30% more buggy and untested code in a failed attempt to do so that crashes so often that there's no real benefit, consider yourself lucky.

The only way it could actually work in practice is if you have a robust test suite that achieves 100% code coverage of the error paths via fault injection (returning NULL for every memory allocation call site).

Here's the story of libdbus, which tried to do this:

http://blog.ometer.com/2008/02/04/out-of-memory-handling-...

Great!

Posted May 16, 2015 16:18 UTC (Sat) by ncm (guest, #165) [Link] (3 responses)

When the response to failure is to run destructors, then the failure response code is exercised on every run of the program. In a well-designed system, the code that runs only on failure is very small, and runs on all failures, so is easy to exercise.

On rust, which doesn't have exception handling, programs idiomatically use standard library components to effectively emulate exception handling. These library components are simple and well-tested. It does depend on the whole system returning Result and Option objects for operations that can fail. I don't know how disruptive it would be to change a low-level function that once could not fail to one that returns a Result.

It is worth noting that the 1.0 release asserts a stable core language, but the library is not declared stable.

Great!

Posted May 16, 2015 21:57 UTC (Sat) by roc (subscriber, #30627) [Link] (2 responses)

The problem is that in most languages, error paths tend to leave your state in a half-done mess.

Also, you'd better be sure your destructors don't trigger allocation directly or indirectly.

Neither of those issues are tested "on every run of the program".

Great!

Posted May 19, 2015 13:34 UTC (Tue) by dgm (subscriber, #49227) [Link] (1 responses)

This is hardly a problem with the language, but with the code written in it, or more exactly, with the coder that wrote it. It's not difficult (even if tedious) to keep in mind your invariants, and rewind changes in case of failure. My personal experience is that the easiest way to do it is follow commit semantics (which for the uninitiated means not changing object's state until no exceptions can be raised).

Great!

Posted May 21, 2015 5:08 UTC (Thu) by roc (subscriber, #30627) [Link]

In practice, following commit semantics for large complex modules is too difficult and/or too expensive at run-time.

Ultimately all bugs are "problems with the coder that wrote it", but coders aren't perfect and their time isn't free, so making coding easier matters.

Great!

Posted May 16, 2015 16:24 UTC (Sat) by cesarb (subscriber, #6266) [Link]

Obligatory quote (from http://www.multicians.org/unix.html):

> We went to lunch afterward, and I remarked to Dennis that easily half the code I was writing in Multics was error recovery code. He said, "We left all that stuff out. If there's an error, we have this routine called panic, and when it is called, the machine crashes, and you holler down the hall, 'Hey, reboot it.'"

Great!

Posted May 18, 2015 13:01 UTC (Mon) by ibukanov (subscriber, #3942) [Link] (9 responses)

> a robust test suite that achieves 100% code coverage of the error paths via fault injection

It is trivial to get 100% coverage by changing error-handling style. Consider, for example, numerical floating point calculations. Typically such code does not check for overflow/underflow errors and rather relies on NaN propagation. Such code simply does not have separated error paths so you get automatic full error path coverage as long as the code is tested at all. The drawback of cause is that it is harder to debug NaN issues due too poor tooling support, but the code itself is robust.

It is possible to use this style for non-numerical code as well, but the problem is that language and library support typically is no-existent.

Great!

Posted May 18, 2015 16:44 UTC (Mon) by tterribe (guest, #66972) [Link] (8 responses)

Until you actually inject NaNs, you don't know what the code will do in that case. Simply reasoning that it follows the same control flow doesn't mean it will do something good, and it's not even obvious it will follow the same control flow, if there are any comparisons against numerical values, or conversions to int, etc. Sure, you may have hit every line of code, but that doesn't mean the code always works.

Some examples:
https://git.xiph.org/?p=opus.git;a=commitdiff;h=d6b56793d...
https://git.xiph.org/?p=opus.git;a=commitdiff;h=58ecb1ac1...

I agree with ms_43: the only way this works is if you actually test with fault-injection. We had next to no allocations in libopus (just a few mallocs when setting up the encoder and decoder, absolutely nothing when encoding or decoding a frame), and we _still_ had to do this to catch our bugs, and it's not like this is our first project in C.

Great!

Posted May 18, 2015 17:22 UTC (Mon) by ibukanov (subscriber, #3942) [Link] (7 responses)

> Some examples:
> https://git.xiph.org/?p=opus.git;a=commitdiff;h=d6b56793d...
> https://git.xiph.org/?p=opus.git;a=commitdiff;h=58ecb1ac1...

These bugs happen when NaN model interacts with the code that does not follow it, which is a nice demonstration of my point of poor language/library support for doing that style in the rest of code.

Now consider what happens if the code with those bugs when the whole code is compiled as asm.js target which defines precisely what should happen when code accesses unallocated memory, a sort of NaN for pointers. The end result would be no crash or possibilities for arbitrary code execution, but rather a corrupted video frame.

Great!

Posted May 18, 2015 20:51 UTC (Mon) by tterribe (guest, #66972) [Link] (6 responses)

Audio, you mean. But considering that at least one of these is an encoder bug, a corrupt frame is not precisely a good outcome. There's still a bug, it's just now harder to find and fix because you've removed causes even further from effects. Not every bug is about memory safety.

But I don't even buy the argument about interacting with "code that does not follow [the NaN model]". Take the first change for example:

- if (x>=8)
+ /* Tests are reversed to catch NaNs */
+ if (!(x<8))

The behavior differs precisely because and only when the comparison follows the NaN model, and while in this case the difference happened to lead to a crash later on because of a float->int conversion, there are plenty of other cases where it would simply produce a wrong result. I do not believe that every time a developer writes a comparison against a float, they are asking themselves, "What happens if one of these values is NaN (or Inf)?" and even when they do ask, that they reason correctly about what *should* happen without testing it. Think about things like convergence tests, etc., that could lead to infinite loops. There's no "NaN for pointers" that is going to fix that.

Great!

Posted May 18, 2015 21:16 UTC (Mon) by ibukanov (subscriber, #3942) [Link] (5 responses)

> it's just now harder to find

This is a tooling issue. Implementation can generate a stacktrace when it generates NaN the first time.

> The behavior differs precisely because and only when the comparison follows the NaN model,

NaN model in this case does not add an extra branch. If one has a test coverage for both branches for normal code, that test coverage covers NaN case as well.

> Think about things like convergence tests, etc., that could lead to infinite loops.

I can trivially kill a media application that stuck in ∞ loop. Compare that with a bug that leads to arbitrary code execution coming from a randomly downloaded media file. These are vastly different outcomes in consequences. Similarly, compare the same buggy C code that corrupts memory compiled as asm.js and run in a browser (effectively forcing something like NaN model for C pointers) and run as a native desktop application. I personally would vastly prefer to experience the bug in its former incarnation rather than latter.

In general I do not claim that NaN model leads to less bugs. Rather the claim is that the total cost of consequences of those bugs is lower.

Great!

Posted May 19, 2015 14:38 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (4 responses)

> If one has a test coverage for both branches for normal code, that test coverage covers NaN case as well.

I think the real point here is that branch coverage is _insufficient_. The tests should exercise the boundaries of all of the "equivalence classes", groups of inputs which are expected to cause similar behavior in the code. That includes covering all the branches, but in this case a NaN input—or an input which can cause NaN in an intermediate calculation—would also be a separate equivalence class, even if the same branches are used (because the real branches are hidden in the ALU hardware).

NaN inputs, or ranges of inputs which result in NaN, represent discontinuities in the program, and discontinuities need to be tested. Branches are merely a special case of this rule, discontinuities in a piecewise-defined function which determines the control flow.

Great!

Posted May 19, 2015 16:40 UTC (Tue) by ibukanov (subscriber, #3942) [Link] (3 responses)

> The tests should exercise the boundaries of all of the "equivalence classes"

Ideally this should be expressed by types. For example, consider the following fragment which should report an error if z becomes NaN, not only when it is negative:

double x, y, z;
x = f1();
y = f2();
z = x * y;
if (z < 0) return error;

With better typesystem the relational operators could only be applied to non-NaN doubles requiring one to write something like:

double x, y, z;
x = f1();
y = f2();
z = x * y;
if (isNaN(t) || t.asDefinedNumber() < 0) return error;

Now, this looks like a contradiction to my assertion that NaN model reduces the number of branches and the number of tests, but consider if NaN would not be supported. Then the code becomes using a common C practice to return a false value to indicate an error:

double x, y, z;
if (!f1(&x)) return error;
if (!f2(&y)) return error;
if (!mult(x, y, &z)) return error;
if (z < 0) return error;

Notice that now there 5 branches rather than 3 with NaN. This reduction comes from the multiplication having well-defined semantic for NaN values. In turn this directly translates into simpler test coverage as to test the code path leading to error with NaN values it is sufficient to arrange for f1 to return NaN, rather than making f1, f2 and mult to return false.

Great!

Posted May 19, 2015 19:06 UTC (Tue) by nybble41 (subscriber, #55106) [Link]

> ... but consider if NaN would not be supported.

I don't think anyone has really suggested removing support for NaN—sum types of this kind are indeed very useful for error handling and other tasks—but merely that the NaN cases need to be tested separately from the non-NaN cases.

> ... In turn this directly translates into simpler test coverage as to test the code path leading to error with NaN values it is sufficient to arrange for f1 to return NaN, rather than making f1, f2 and mult to return false.

Treating f1() and f2() as the inputs, I would define six equivalence classes based on the _product_ of the inputs, z, and how z is used in the condition: -Inf, <0, 0, >0, +Inf, NaN. After all, the point of the exercise is not to test the machine's multiplication algorithm, and the behavior of the code depends only on the product and not the individual inputs. Of course, this presumes white-box testing; unless the specifications are unusually detailed, a black-box tester wouldn't be able to assume the use of the multiplication primitive and would therefore need more test cases to cover different combinations of inputs.

Full branch coverage would only require two tests, but that isn't enough to show that the condition is implemented correctly for +/-Inf, NaN, or zero, each of which could easily exhibit incorrect behavior without suggesting deliberate malice on the part of the implementer.

Great!

Posted May 20, 2015 11:58 UTC (Wed) by paulj (subscriber, #341) [Link] (1 responses)

Ideally this should be expressed by types.

E.g., Monads? Isn't this what they were invented for?

Great!

Posted May 20, 2015 13:06 UTC (Wed) by ibukanov (subscriber, #3942) [Link]

> E.g., Monads?

Yes, ideally Monads as implemented if not as in Koka [1] but at least as in PureScript [2]. Haskell typesystem is not powerful enough to express many useful idioms that are required by a system language or language where one need to interface a lot with code in other languages. But even just supporting kind-2 types and some syntax sugar should help Rust a lot.

[1] - http://research.microsoft.com/en-us/projects/koka/
[2] - http://www.purescript.org/