LWN.net Logo

What every C Programmer should know about undefined behavior #2/3

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 21:08 UTC (Mon) by HelloWorld (guest, #56129)
In reply to: What every C Programmer should know about undefined behavior #2/3 by nix
Parent article: What every C Programmer should know about undefined behavior #2/3

> We'll never be free of such problems. After all, we also have no good way to prove that an arbitrary application will halt
This is, and has always been, a bullshit argument. The fact that we can't write a program that will tell us whether some program will halt doesn't mean that we can't prove it for some specific program we care about.


(Log in to post comments)

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 1:32 UTC (Tue) by wahern (subscriber, #37304) [Link]

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 5:26 UTC (Tue) by cmccabe (guest, #60281) [Link]

Klee is cool; thanks for sharing that. To be fair, though, it's not a tool for proving programs correct; it's a tool for generating lots and lots of test cases automatically.

There was another tool for Ruby that would randomly alter your code at runtime (!) and see how you handled the resulting errors. I am having a really hard time remembering the name, though...

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 10:26 UTC (Tue) by pager2 (guest, #72197) [Link]

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 18:35 UTC (Tue) by wahern (subscriber, #37304) [Link]

Correct me if I'm wrong, but symbolic execution engines are what you get when you don't allow the halting problem to dissuade you from analyzing the code, and KLEE is a symbolic execution engine that generates test cases for interesting paths that it has deduced, such as paths to a bug. If a program is small enough KLEE can find many paths which result in the termination of the program, but for large applications KLEE can run for hours or days. In fact, it could run forever, AFAIU, if its heuristics don't terminate the algorithm. This is why it allows you to specify different starting points so you can get complete coverage over more interesting sub-components of an application.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 21:16 UTC (Tue) by kleptog (subscriber, #1183) [Link]

Indeed, the halting problem is (amongst other things) just a formal way of saying you can't write a universal theorem prover. Imagine, you could write a program that terminates on the first odd perfect number, or on an unexpected zero of the Riemann zeta function and your "halting problem solver" would give you the answer to questions mathematicians having been breaking their heads over for generations.

It's not clear that the programs we write are anywhere near that kind of complexity. I feel it should be possible to prove some useful properties of programs but we miss some infrastructure for describing the things we want to prove. You can say "all strings must be UTF-8", but how can you explain that to a computer? There's always exceptions to deal with (the result of read() for example).

It's pointed elsewhere in this thread that want you want is to check the program against a model. The problem is, we (often) don't have the model written in a formal way. So we might need a program that tries to determine the model, which we humans then check and which can then be used to check the program. So you can get messages like: function foo is always called with a UTF-8 string, except for that call over there.

Perhaps this can happen during development, so you get prompted: method foo was always called with object bar, and now also with object baz, correct y/n? Depending on the result the model is updated. The question is, can you make this so it doesn't get in your way too much.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 14:20 UTC (Wed) by jd (guest, #26381) [Link]

I'd probably start with Z or ObjectZ because these are well-studied. Most, if not all, programs that can be written must be translatable into the appropriate Z notation, albeit with unknowable complexity as Z isn't terribly compact.

The reason for using something like Z is that it is implementation-independent, so it doesn't make any difference if you're writing in C, C++, LISP or Prolog. There will be a valid mapping.

The disadvantage of Z is that in the same way the same specification can produce many implementations, the same implementation can produce many specifications. These will not be of equal use and I know of no easy way to generate the specification in a way that guarantees usefulness.

(For those more familiar with Z as the starting point, I'm totally inverting the flow. This may shock some. Sorry. With this scheme, I'm solely concerned with back-engineering what the specification of the code is from the code, not in generating code from a specification.)

The rationale is that Z is intended to be easier to validate than code. Easier, not easy. It's still hard work. But doing the validation (which is the hardest part) in a single frame of reference and converting to it from the different langauges (which is hard but not as bad) should require less work than writing a validator specific to each language.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 20:10 UTC (Wed) by njs (guest, #40338) [Link]

> Indeed, the halting problem is (amongst other things) just a formal way of saying you can't write a universal theorem prover. Imagine, you could write a program that terminates on the first odd perfect number, or on an unexpected zero of the Riemann zeta function and your "halting problem solver" would give you the answer to questions mathematicians having been breaking their heads over for generations.

That isn't quite right. You can easily write a program that answers those questions *if they have an answer* -- the halting problem is related to the fact that there might not be an answer at all. It's entirely possible that those particular questions do have answers, though, which case a computer program could find it. But this doesn't help much in practice because that computer program probably won't finish until sometime after the heat-death of the universe.

> You can say "all strings must be UTF-8", but how can you explain that to a computer? There's always exceptions to deal with (the result of read() for example).

Oh, but actually this is easy! Any reasonably competent statically-typed language can do this. E.g., in C++, you define a type "utf8_string", and you make sure that any publicly accessible way of constructing an instance does appropriate validity checking. Then code that assumes valid utf8 just declares that it takes input of type utf8_string. read(), of course, doesn't return a utf8_string, so if you want to read a string and then you want to pass it to some code that assumes utf8 (maybe somewhere else entirely in your program, after its been passed through 10 other functions), the compiler won't let you unless you sanitize it first. And as you refactor your APIs, the conversion naturally gets moved around to be in the right place.

It works great. But I've only seen one project that used C++ like this. It's very sad :-(.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 11:34 UTC (Wed) by Wol (guest, #4433) [Link]

>This is, and has always been, a bullshit argument. The fact that we can't write a program that will tell us whether some program will halt doesn't mean that we can't prove it for some specific program we care about.

Just because you've proved it in the imaginary world of maths doesn't mean diddley-squat. The ONLY valid proof in the real world is "well, we haven't been wrong yet - but there's always a first time ...".

Cheers,
Wol

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 14:22 UTC (Wed) by jd (guest, #26381) [Link]

Well, all programs are equal to a set of mathematical expressions. That can be proven and may well form a part of the battle against software patents since you can't patent mathematical expressions.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds