LWN.net Logo

What every C Programmer should know about undefined behavior #2/3

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 21:16 UTC (Tue) by kleptog (subscriber, #1183)
In reply to: What every C Programmer should know about undefined behavior #2/3 by HelloWorld
Parent article: What every C Programmer should know about undefined behavior #2/3

Indeed, the halting problem is (amongst other things) just a formal way of saying you can't write a universal theorem prover. Imagine, you could write a program that terminates on the first odd perfect number, or on an unexpected zero of the Riemann zeta function and your "halting problem solver" would give you the answer to questions mathematicians having been breaking their heads over for generations.

It's not clear that the programs we write are anywhere near that kind of complexity. I feel it should be possible to prove some useful properties of programs but we miss some infrastructure for describing the things we want to prove. You can say "all strings must be UTF-8", but how can you explain that to a computer? There's always exceptions to deal with (the result of read() for example).

It's pointed elsewhere in this thread that want you want is to check the program against a model. The problem is, we (often) don't have the model written in a formal way. So we might need a program that tries to determine the model, which we humans then check and which can then be used to check the program. So you can get messages like: function foo is always called with a UTF-8 string, except for that call over there.

Perhaps this can happen during development, so you get prompted: method foo was always called with object bar, and now also with object baz, correct y/n? Depending on the result the model is updated. The question is, can you make this so it doesn't get in your way too much.


(Log in to post comments)

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 14:20 UTC (Wed) by jd (guest, #26381) [Link]

I'd probably start with Z or ObjectZ because these are well-studied. Most, if not all, programs that can be written must be translatable into the appropriate Z notation, albeit with unknowable complexity as Z isn't terribly compact.

The reason for using something like Z is that it is implementation-independent, so it doesn't make any difference if you're writing in C, C++, LISP or Prolog. There will be a valid mapping.

The disadvantage of Z is that in the same way the same specification can produce many implementations, the same implementation can produce many specifications. These will not be of equal use and I know of no easy way to generate the specification in a way that guarantees usefulness.

(For those more familiar with Z as the starting point, I'm totally inverting the flow. This may shock some. Sorry. With this scheme, I'm solely concerned with back-engineering what the specification of the code is from the code, not in generating code from a specification.)

The rationale is that Z is intended to be easier to validate than code. Easier, not easy. It's still hard work. But doing the validation (which is the hardest part) in a single frame of reference and converting to it from the different langauges (which is hard but not as bad) should require less work than writing a validator specific to each language.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 20:10 UTC (Wed) by njs (guest, #40338) [Link]

> Indeed, the halting problem is (amongst other things) just a formal way of saying you can't write a universal theorem prover. Imagine, you could write a program that terminates on the first odd perfect number, or on an unexpected zero of the Riemann zeta function and your "halting problem solver" would give you the answer to questions mathematicians having been breaking their heads over for generations.

That isn't quite right. You can easily write a program that answers those questions *if they have an answer* -- the halting problem is related to the fact that there might not be an answer at all. It's entirely possible that those particular questions do have answers, though, which case a computer program could find it. But this doesn't help much in practice because that computer program probably won't finish until sometime after the heat-death of the universe.

> You can say "all strings must be UTF-8", but how can you explain that to a computer? There's always exceptions to deal with (the result of read() for example).

Oh, but actually this is easy! Any reasonably competent statically-typed language can do this. E.g., in C++, you define a type "utf8_string", and you make sure that any publicly accessible way of constructing an instance does appropriate validity checking. Then code that assumes valid utf8 just declares that it takes input of type utf8_string. read(), of course, doesn't return a utf8_string, so if you want to read a string and then you want to pass it to some code that assumes utf8 (maybe somewhere else entirely in your program, after its been passed through 10 other functions), the compiler won't let you unless you sanitize it first. And as you refactor your APIs, the conversion naturally gets moved around to be in the right place.

It works great. But I've only seen one project that used C++ like this. It's very sad :-(.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds