What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 21:10 UTC (Mon) by njs (subscriber, #40338)
In reply to: What every C Programmer should know about undefined behavior #2/3 by nix
Parent article: What every C Programmer should know about undefined behavior #2/3

It's not clear that we couldn't prove arbitrary non-trivial properties (including halting) about *any application we actually care about*, though.
For example, type systems (if properly abused) let you prove all sorts of interesting properties about global data flow (e.g., the property "no string ever goes from the user to the database without being sanitized"). There's no good reason why the behavior of your X server, kernel, web server, browser, etc. should ever be *uncomputable*.

Large-scale proofs about programs are very hard, but that's a tools problem, not a deep conceptual abyss where no-one should ever even try to tread. (And we'd probably have better tools -- and programming languages -- if people were less scared of the abyss.)

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 21:50 UTC (Mon) by cmccabe (guest, #60281) [Link] (23 responses)

Programming language researchers have been jumping into the "abyss" for decades now, and nobody has heard of them since. The most popular languages these days were started as hobby projects by professional developers (Ruby, Python, Perl) or funded by big companies that wanted to build out a platform (Java, C#).

In my humble (non-tenured) opinion, proofs belong in math class; model checking belongs in engineering. If you tried to prove any non-trivial thing about your life, you would quickly find it impossible.

Imagine trying to prove that you were going to go to the store and return with a gallon of milk. It sounds simple, but consider: how can you prove that you won't have a car crash or a heart attack? Even if none of those disasters happen, the store might be closed. The kind of milk you want might not be in stock. The traffic might be so heavy that it takes you an hour to drive there. If you can only get skim milk, does that still count as picking up the milk? If traffic takes an hour and the milk is warm when you get back, does that count as success?

Similarly, if you try to prove that your browser will successfully display a web page, you slam head-on into a mountain of difficulties. What if the RAM is bad? How about the hard disk? Will the network lose some packets? What if the web page is so complex that it takes 10 minutes to render on our puny CPU? What about the libraries I depend on? Is there a bug in there? What if the web page is rendered in an "ugly" way (a subjective term). Can I prove that that won't happen? Of course not.

As Einstein said: "As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality."

The best I can do is set up a bunch of models and validate my program against them. Static type checking is one such model. Unit tests are another set of models. Tools like lint, sparse, cppcheck provide yet more tests. Another set of tests is giving the program to users and seeing if they like the responsiveness, the user interface, and the overall design.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 22:17 UTC (Mon) by HelloWorld (guest, #56129) [Link]

> Similarly, if you try to prove that your browser will successfully display a web page, you slam head-on into a mountain of difficulties.
Of course it's not currently feasible to prove the correctness of a complex program like a web browser, already because there is no formal mathematical specification for how HTML documents should be rendered. But it'd already be a success if certain properties of a program could be proven automatically. Properties such as "every file descriptor opened is also closed at some point" or "every array index is within the bounds of the array".

> What if the RAM is bad? How about the hard disk?
Bad RAM is not a software bug, thus it's out of scope for software developers. It's like saying that you can't prove anything in math because your axiom or logic may be "broken". You just have to assume something. In Peano numerals, it's the existence of the number 0, and if you want to prove stuff about programs, you have to assume that the machine you run it on works.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 22:33 UTC (Mon) by njs (subscriber, #40338) [Link] (13 responses)

> In my humble (non-tenured) opinion, proofs belong in math class; model checking belongs in engineering.

In my experience, careful proofs can form a critical part of complex system design. I'm thinking of, for example, the problem of a VCS trying to merge a DAG of changes, each of which may contain arbitrary file adds/deletes/renames. This is a horrible problem with many incorrect solutions littering history, but, solvable with careful formal reasoning.

> Similarly, if you try to prove that your browser will successfully display a web page, you slam head-on into a mountain of difficulties. What if the RAM is bad? How about the hard disk? Will the network lose some packets? What if the web page is so complex that it takes 10 minutes to render on our puny CPU? What about the libraries I depend on? Is there a bug in there? What if the web page is rendered in an "ugly" way (a subjective term). Can I prove that that won't happen? Of course not.

Yes, but that just means you tried to prove the wrong thing.

Can that web page trigger writes to arbitrary pages on my hard disk? Is there anywhere in my program that uses latin1 when it should be using UTF-8? Will this image decoder return either valid data or a defined error code on arbitrary inputs? Does this program invoke undefined behavior? Can this data be corrupted if certain code is scheduled concurrently? Proving those kinds of properties is totally useful, possible in principle (if you set things up right), and in some cases easily doable today.

> The best I can do is set up a bunch of models and validate my program against them. Static type checking is one such model.

But static type checking *is* a way to prove global properties of your program! Or another example: C++'s "private:" keyword is useful not because it gives some kind of 'security' or something (like many documents about it seem to think), but because it lets me know for certain (i.e., prove) that I only have to look at a certain bounded amount of code if I want to see how certain variables are modified. (And then that in turn is useful because it lets me verify that all that code maintains the relevant invariants, which is another kind of informal proof.)

I just want better tools for *non-heuristic* reasoning about programs. That's really not impossible -- though it might require changes to everything from the programming languages to your program's architecture -- and it would be useful to people without tenure, too.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 22:56 UTC (Mon) by cmccabe (guest, #60281) [Link] (11 responses)

> Or another example: C++'s "private:" keyword is useful not because it
> gives some kind of 'security' or something (like many documents about it
> seem to think), but because it lets me know for certain (i.e., prove) that
> I only have to look at a certain bounded amount of code if I want to see
> how certain variables are modified

Actually, the "private" keyword in C++ allows you to prove no such thing. I can simply typecast the class to a byte buffer and modify to my heart's content.

If I want to be even more evil, I can add:
#define public private
to the beginning of my .cc file and include the header file for your class. Then nobody will stop me from doing whatever I want with your private data-and methods-- not the compiler, and certainly not the linker.

Java also has a way for "unauthorized" classes to get access to private data. I think you can use the Reflect package to get at it.

Private data in Java and C++ is not and was not intended to provide the kind of guarantees a proof checker would need.

On the other hand, if the goal is to allow the programmer to have a reasonable mental model, they work pretty well!

> I just want better tools for *non-heuristic* reasoning about programs.

Too bad. Artificial intelligence is about at where chemistry was in 1000 AD. Theorem provers are good for proving theorems, but bad at real-world reasoning.

On the other hand, what I can offer you is sandboxed programming langauges and model checkers. It's amazing how much more productive you can be when you have a garbage colector and a good type system.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 22:58 UTC (Mon) by cmccabe (guest, #60281) [Link] (7 responses)

That should have been
#define private public
#define protected public

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 23:12 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (5 responses)

I think a compliant JVM's byte code verifier + security policy ensures that you can't use Reflection to poke the private members of somebody else's code. I think you'll get a SecurityException at the line where you attempt to violate policy if you go about it in the natural way. And if you try to write raw byte code to sidestep, the verifier will reject your .class as invalid.

Of course in a debug mode, or in a non-compliant JVM, or if there's a bug, you may make this work. But in /theory/ at least they've thought of this, so it would be fair for a Java programmer (unlike a C++ programmer) to treat private members as genuinely private.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 23:30 UTC (Mon) by cmccabe (guest, #60281) [Link] (4 responses)

I can get the private data of any class that implements Serializable pretty easily. I can always get access to protected data members by defining my own subclass (unless the class is final, but putting protected data members in a final class is an odd choice).

I have a pretty good hunch that this API gives me a hole in "private" big enough to drive my truck through.
http://download.oracle.com/javase/1.5.0/docs/api/java/lan...

I haven't tried it, though.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 1:07 UTC (Tue) by foom (subscriber, #14868) [Link]

If there's no security manager policy in the way, you certainly can access private members with the obvious reflection APIs. If there is a security manager policy, it also prevents you from doing the other Evil Things that would let you access the private members.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 8:54 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)

Not only might a class choose not to implement Serializable if it doesn't want you looking at its internals, it might also choose to implement Serializable in a way that's completely unhelpful while obeying the spec. In some cases this might be the most natural approach e.g. their magic internal values could change in a future otherwise compatible replacement, so they reconstruct them when de-serialising, rather than including them in the serialised output.

In other cases it would be very deliberate e.g. we can imagine a system where untrusted code runs in a Java sandbox on a remote system, and has access to certain serialisable objects which are sensitive, so their serialisations are encrypted, versioned and signed with keys not available to the untrusted code.

Even if you're allowed to make the subclass (security policy again) - your subclass doesn't get to look at protected data members from other instances, so this will often be useless. Remember this is Java, so type restrictions are enforced at runtime.

Basically this goes on and on, unlike in C++ the designers actually intended this to be enforced, not just a vague guideline to help those willing to help themselves. So even if you find a crack in the wall, someone will fix it. There really aren't any gaps "big enough to drive a truck through" as you imagine and as is the case in something like C++. If you want to drive a truck in, you need someone to conveniently open the truck-sized gate from the other side by disabling the relevant security policy.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 9:01 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

"your subclass doesn't get to look at protected data members from other instances"

Actually this bit might be wrong. You could be able to just pass in a suitable instance and have the code inside your imposter subclass poke around in its protected internals. But again the security policy gets to decide whether you're allowed to make this subclass at all (unlike the 'final' keyword this places no such restriction on the author of the rest of the system who may very well operate under a different policy).

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 17:21 UTC (Tue) by jeremiah (subscriber, #1221) [Link]

Just as an FYI the easiest way to get access is to use the reflection API, and the setAccessible() method. Things get a little deep in the reflection stuff, but being able to gain access to anything and modify comes in handy sometimes. I generally use it when writing serialization libraries.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 8:39 UTC (Tue) by chad.netzer (subscriber, #4257) [Link]

You also might need:

#define class struct

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 1:28 UTC (Tue) by foom (subscriber, #14868) [Link]

> If I want to be even more evil, I can add:
> #define public private
> to the beginning of my .cc file and include the header file for your class. Then nobody will stop me from doing whatever I want with your private data-and methods-- not the compiler, and certainly not the linker.

The compiler/linker will prevent that if you're using MS VC++: on that platform, the mangling for public and private methods is different! Forces you to be more evil than just #define to get your way. :)

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 14:35 UTC (Tue) by dgm (subscriber, #49227) [Link]

> It's amazing how much more productive you can be when you have a garbage colector and a good type system.

<flame>Oh, yes! You can churn buggy *and s... l... o... w* code much faster (and in greater quantities!) with that. The new guy in the corner just does this all day long. Thanks C#!</flame>

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 23:57 UTC (Thu) by marcH (subscriber, #57642) [Link]

> I can simply typecast the class to a byte buffer and modify to my heart's content. If I want to be even more evil, I can...

To demonstrate that code verification tools do not work, you hit them with a sledgehammer. Interesting and conclusive.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 22:57 UTC (Mon) by mpr22 (subscriber, #60784) [Link]

Actually, given that C and C++ pointers have no fandango-on-core protection, to prove your private members never get edited by non-member non-friend code you have to prove the memory-access correctness of the entire program, including the underlying standard library.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 9:00 UTC (Tue) by marcH (subscriber, #57642) [Link] (7 responses)

> ... and nobody has heard of them since.

Such a statement requires omniscience.

I am afraid you are not omniscient

http://en.wikipedia.org/wiki/Polyspace (just a random example)

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 22:38 UTC (Tue) by cmccabe (guest, #60281) [Link] (6 responses)

I'm not against the use of little proof checkers when writing code. Arguably, static type systems are just examples of little proof checkers. Another example is the lexing and parsing step where the compiler decides whether your program is syntactically correct. You can argue that while parsing the program, it proves that the program is parsable.

Polyspace looks like another such tool. It can prove that certain programs clearly violate C semantics, like dereferencing a NULL pointer. But it can't prove the program as a whole correct because it doesn't have enough information about the requirements and the environment.

There are other examples of little model checkers. Sparse allows kernel hackers to annotate functions with interesting properties like what locks it takes, etc. cpplint finds potential errors in C++ programs. And -Wall and -Wextra add even more checks. Those things are great and we should have more of them.

But if you did a survey of most academic programming language departments, you would find that most of them focus on developing completely new programming languages and rewriting things from scratch. Nothing is more common than a grad student inventing a new functional programming language for his thesis. Nothing is more uncommon than an actual useful tool coming out of it. Even your own example confirms this: Polyspace is commercial and not developed in academia.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 0:34 UTC (Wed) by price (guest, #59790) [Link] (5 responses)

> Polyspace is commercial and not developed in academia.

Wrong. It is commercial, but it came from academia.

Quick history of Polyspace, unfortunately quoting an obituary (http://christele.faure.pagesperso-orange.fr/AlainDeutsch.html - see original for many links):

1992-1993: Alain obtained his PhD on static analysis by abstract interpretation for dynamic data structures with pointers. He made a post-doc at the University of Carnegie-Mellon on the static analysis of communication protocols.
1993-1996: Alain joined PARA/INRIA Rocquencourt. He focussed on the static analysis of programs and developed the IABC analyser based on abstract interpretation.
1996-1999: He specialized his IABC analyzer to runtime error detection in Ada source code and proved the effectiveness of static analysis by exhibiting code errors in the flight programme of Ariane 5 incriminated in the flight 501 crash. Alain was a member of the qualification board of flight 502, which gave the green light for the second (successful) launch of Ariane 5. ...
1999-2006: He founded PolySpace Technologies together with Daniel Pilaud and became the Technical Director (Chief Technical Officer). PolySpace Technologies industrialized his IABC analyzer for Ada source code, extended it to C C++ Ada95 and commercialized PolySpace Verifier for Ada83, C, C++, Ada95.

So the tool was developed by a PhD researcher working at a research lab (and well-known hotbed of static type systems), refined for several years, and finally commercialized by him and others.

Another static analysis tool widely used in practice is Coverity -- that one came from a bunch of academics at Stanford. That's the usual story for this kind of tool.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 4:08 UTC (Wed) by cmccabe (guest, #60281) [Link] (4 responses)

I stand corrected. It looks like Polyspace came mostly from academia.

It's also interesting that it was associated with the launch of a European rocket, Ariane 5. Everyone knows that the space race in the 1960s helped to push technology ahead in the United States; I guess that still happens, at least to a certain extent.

I think this kind of research is really interesting. It seems like it could help create more efficient compilers and perhaps better tools.

For what it's worth, I like static type systems. I really hope that in the future, we'll be able to have more and more information about programs even before running them. Programmers ought to be free from the drudgery of spotting typos or passing the wrong arguments to functions. Or even accidentally dereferencing a pointer before assigning it. As I said before, though, there are always things in the design that are impossible to "prove" (in the mathematical sense), like user interface, the performance of heuristic algorithms, and artistic design.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 5:02 UTC (Wed) by price (guest, #59790) [Link] (3 responses)

Huh, nice observation re: this technology being connected to the space race. I hadn't made that connection.

Definitely beats Tang, if you ask me.

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 3:53 UTC (Thu) by raven667 (subscriber, #5198) [Link] (2 responses)

If Feynman is to be believed then a significant part of modern technological living was developed at least in part as part of the space race and this is evidence that this continues to be true at a smaller scale.

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 13:11 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

Do not forget World War II just before that. In some cases the same people involved http://en.wikipedia.org/wiki/Wernher_von_Braun

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 18:01 UTC (Thu) by raven667 (subscriber, #5198) [Link]

well that part of the space race was about who captured the better germans at the end.