What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
Posted May 16, 2011 21:10 UTC (Mon) by njs (subscriber, #40338)In reply to: What every C Programmer should know about undefined behavior #2/3 by nix
Parent article: What every C Programmer should know about undefined behavior #2/3
For example, type systems (if properly abused) let you prove all sorts of interesting properties about global data flow (e.g., the property "no string ever goes from the user to the database without being sanitized"). There's no good reason why the behavior of your X server, kernel, web server, browser, etc. should ever be *uncomputable*.
Large-scale proofs about programs are very hard, but that's a tools problem, not a deep conceptual abyss where no-one should ever even try to tread. (And we'd probably have better tools -- and programming languages -- if people were less scared of the abyss.)
Posted May 16, 2011 21:50 UTC (Mon)
by cmccabe (guest, #60281)
[Link] (23 responses)
In my humble (non-tenured) opinion, proofs belong in math class; model checking belongs in engineering. If you tried to prove any non-trivial thing about your life, you would quickly find it impossible.
Imagine trying to prove that you were going to go to the store and return with a gallon of milk. It sounds simple, but consider: how can you prove that you won't have a car crash or a heart attack? Even if none of those disasters happen, the store might be closed. The kind of milk you want might not be in stock. The traffic might be so heavy that it takes you an hour to drive there. If you can only get skim milk, does that still count as picking up the milk? If traffic takes an hour and the milk is warm when you get back, does that count as success?
Similarly, if you try to prove that your browser will successfully display a web page, you slam head-on into a mountain of difficulties. What if the RAM is bad? How about the hard disk? Will the network lose some packets? What if the web page is so complex that it takes 10 minutes to render on our puny CPU? What about the libraries I depend on? Is there a bug in there? What if the web page is rendered in an "ugly" way (a subjective term). Can I prove that that won't happen? Of course not.
As Einstein said: "As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality."
The best I can do is set up a bunch of models and validate my program against them. Static type checking is one such model. Unit tests are another set of models. Tools like lint, sparse, cppcheck provide yet more tests. Another set of tests is giving the program to users and seeing if they like the responsiveness, the user interface, and the overall design.
Posted May 16, 2011 22:17 UTC (Mon)
by HelloWorld (guest, #56129)
[Link]
> What if the RAM is bad? How about the hard disk?
Posted May 16, 2011 22:33 UTC (Mon)
by njs (subscriber, #40338)
[Link] (13 responses)
In my experience, careful proofs can form a critical part of complex system design. I'm thinking of, for example, the problem of a VCS trying to merge a DAG of changes, each of which may contain arbitrary file adds/deletes/renames. This is a horrible problem with many incorrect solutions littering history, but, solvable with careful formal reasoning.
> Similarly, if you try to prove that your browser will successfully display a web page, you slam head-on into a mountain of difficulties. What if the RAM is bad? How about the hard disk? Will the network lose some packets? What if the web page is so complex that it takes 10 minutes to render on our puny CPU? What about the libraries I depend on? Is there a bug in there? What if the web page is rendered in an "ugly" way (a subjective term). Can I prove that that won't happen? Of course not.
Yes, but that just means you tried to prove the wrong thing.
Can that web page trigger writes to arbitrary pages on my hard disk? Is there anywhere in my program that uses latin1 when it should be using UTF-8? Will this image decoder return either valid data or a defined error code on arbitrary inputs? Does this program invoke undefined behavior? Can this data be corrupted if certain code is scheduled concurrently? Proving those kinds of properties is totally useful, possible in principle (if you set things up right), and in some cases easily doable today.
> The best I can do is set up a bunch of models and validate my program against them. Static type checking is one such model.
But static type checking *is* a way to prove global properties of your program! Or another example: C++'s "private:" keyword is useful not because it gives some kind of 'security' or something (like many documents about it seem to think), but because it lets me know for certain (i.e., prove) that I only have to look at a certain bounded amount of code if I want to see how certain variables are modified. (And then that in turn is useful because it lets me verify that all that code maintains the relevant invariants, which is another kind of informal proof.)
I just want better tools for *non-heuristic* reasoning about programs. That's really not impossible -- though it might require changes to everything from the programming languages to your program's architecture -- and it would be useful to people without tenure, too.
Posted May 16, 2011 22:56 UTC (Mon)
by cmccabe (guest, #60281)
[Link] (11 responses)
Actually, the "private" keyword in C++ allows you to prove no such thing. I can simply typecast the class to a byte buffer and modify to my heart's content.
If I want to be even more evil, I can add:
Java also has a way for "unauthorized" classes to get access to private data. I think you can use the Reflect package to get at it.
Private data in Java and C++ is not and was not intended to provide the kind of guarantees a proof checker would need.
On the other hand, if the goal is to allow the programmer to have a reasonable mental model, they work pretty well!
> I just want better tools for *non-heuristic* reasoning about programs.
Too bad. Artificial intelligence is about at where chemistry was in 1000 AD. Theorem provers are good for proving theorems, but bad at real-world reasoning.
On the other hand, what I can offer you is sandboxed programming langauges and model checkers. It's amazing how much more productive you can be when you have a garbage colector and a good type system.
C.
Posted May 16, 2011 22:58 UTC (Mon)
by cmccabe (guest, #60281)
[Link] (7 responses)
:)
Posted May 16, 2011 23:12 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link] (5 responses)
Of course in a debug mode, or in a non-compliant JVM, or if there's a bug, you may make this work. But in /theory/ at least they've thought of this, so it would be fair for a Java programmer (unlike a C++ programmer) to treat private members as genuinely private.
Posted May 16, 2011 23:30 UTC (Mon)
by cmccabe (guest, #60281)
[Link] (4 responses)
I have a pretty good hunch that this API gives me a hole in "private" big enough to drive my truck through.
I haven't tried it, though.
Posted May 17, 2011 1:07 UTC (Tue)
by foom (subscriber, #14868)
[Link]
Posted May 17, 2011 8:54 UTC (Tue)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
In other cases it would be very deliberate e.g. we can imagine a system where untrusted code runs in a Java sandbox on a remote system, and has access to certain serialisable objects which are sensitive, so their serialisations are encrypted, versioned and signed with keys not available to the untrusted code.
Even if you're allowed to make the subclass (security policy again) - your subclass doesn't get to look at protected data members from other instances, so this will often be useless. Remember this is Java, so type restrictions are enforced at runtime.
Basically this goes on and on, unlike in C++ the designers actually intended this to be enforced, not just a vague guideline to help those willing to help themselves. So even if you find a crack in the wall, someone will fix it. There really aren't any gaps "big enough to drive a truck through" as you imagine and as is the case in something like C++. If you want to drive a truck in, you need someone to conveniently open the truck-sized gate from the other side by disabling the relevant security policy.
Posted May 17, 2011 9:01 UTC (Tue)
by tialaramex (subscriber, #21167)
[Link]
Actually this bit might be wrong. You could be able to just pass in a suitable instance and have the code inside your imposter subclass poke around in its protected internals. But again the security policy gets to decide whether you're allowed to make this subclass at all (unlike the 'final' keyword this places no such restriction on the author of the rest of the system who may very well operate under a different policy).
Posted May 17, 2011 17:21 UTC (Tue)
by jeremiah (subscriber, #1221)
[Link]
Posted May 17, 2011 8:39 UTC (Tue)
by chad.netzer (subscriber, #4257)
[Link]
Posted May 17, 2011 1:28 UTC (Tue)
by foom (subscriber, #14868)
[Link]
The compiler/linker will prevent that if you're using MS VC++: on that platform, the mangling for public and private methods is different! Forces you to be more evil than just #define to get your way. :)
Posted May 17, 2011 14:35 UTC (Tue)
by dgm (subscriber, #49227)
[Link]
<flame>Oh, yes! You can churn buggy *and s... l... o... w* code much faster (and in greater quantities!) with that. The new guy in the corner just does this all day long. Thanks C#!</flame>
Posted May 19, 2011 23:57 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
To demonstrate that code verification tools do not work, you hit them with a sledgehammer. Interesting and conclusive.
Posted May 16, 2011 22:57 UTC (Mon)
by mpr22 (subscriber, #60784)
[Link]
Posted May 17, 2011 9:00 UTC (Tue)
by marcH (subscriber, #57642)
[Link] (7 responses)
Such a statement requires omniscience.
I am afraid you are not omniscient
http://en.wikipedia.org/wiki/Polyspace (just a random example)
Posted May 17, 2011 22:38 UTC (Tue)
by cmccabe (guest, #60281)
[Link] (6 responses)
Polyspace looks like another such tool. It can prove that certain programs clearly violate C semantics, like dereferencing a NULL pointer. But it can't prove the program as a whole correct because it doesn't have enough information about the requirements and the environment.
There are other examples of little model checkers. Sparse allows kernel hackers to annotate functions with interesting properties like what locks it takes, etc. cpplint finds potential errors in C++ programs. And -Wall and -Wextra add even more checks. Those things are great and we should have more of them.
But if you did a survey of most academic programming language departments, you would find that most of them focus on developing completely new programming languages and rewriting things from scratch. Nothing is more common than a grad student inventing a new functional programming language for his thesis. Nothing is more uncommon than an actual useful tool coming out of it. Even your own example confirms this: Polyspace is commercial and not developed in academia.
Posted May 18, 2011 0:34 UTC (Wed)
by price (guest, #59790)
[Link] (5 responses)
> Polyspace is commercial and not developed in academia. Wrong. It is commercial, but it came from academia. Quick history of Polyspace, unfortunately quoting an obituary (http://christele.faure.pagesperso-orange.fr/AlainDeutsch.html - see original for many links): So the tool was developed by a PhD researcher working at a research lab (and well-known hotbed of static type systems), refined for several years, and finally commercialized by him and others. Another static analysis tool widely used in practice is Coverity -- that one came from a bunch of academics at Stanford. That's the usual story for this kind of tool.
Posted May 18, 2011 4:08 UTC (Wed)
by cmccabe (guest, #60281)
[Link] (4 responses)
It's also interesting that it was associated with the launch of a European rocket, Ariane 5. Everyone knows that the space race in the 1960s helped to push technology ahead in the United States; I guess that still happens, at least to a certain extent.
I think this kind of research is really interesting. It seems like it could help create more efficient compilers and perhaps better tools.
For what it's worth, I like static type systems. I really hope that in the future, we'll be able to have more and more information about programs even before running them. Programmers ought to be free from the drudgery of spotting typos or passing the wrong arguments to functions. Or even accidentally dereferencing a pointer before assigning it. As I said before, though, there are always things in the design that are impossible to "prove" (in the mathematical sense), like user interface, the performance of heuristic algorithms, and artistic design.
Posted May 18, 2011 5:02 UTC (Wed)
by price (guest, #59790)
[Link] (3 responses)
Definitely beats Tang, if you ask me.
Posted May 19, 2011 3:53 UTC (Thu)
by raven667 (subscriber, #5198)
[Link] (2 responses)
Posted May 19, 2011 13:11 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (1 responses)
Posted May 19, 2011 18:01 UTC (Thu)
by raven667 (subscriber, #5198)
[Link]
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
Of course it's not currently feasible to prove the correctness of a complex program like a web browser, already because there is no formal mathematical specification for how HTML documents should be rendered. But it'd already be a success if certain properties of a program could be proven automatically. Properties such as "every file descriptor opened is also closed at some point" or "every array index is within the bounds of the array".
Bad RAM is not a software bug, thus it's out of scope for software developers. It's like saying that you can't prove anything in math because your axiom or logic may be "broken". You just have to assume something. In Peano numerals, it's the existence of the number 0, and if you want to prove stuff about programs, you have to assume that the machine you run it on works.
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
> gives some kind of 'security' or something (like many documents about it
> seem to think), but because it lets me know for certain (i.e., prove) that
> I only have to look at a certain bounded amount of code if I want to see
> how certain variables are modified
#define public private
to the beginning of my .cc file and include the header file for your class. Then nobody will stop me from doing whatever I want with your private data-and methods-- not the compiler, and certainly not the linker.
What every C Programmer should know about undefined behavior #2/3
#define private public
#define protected public
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
http://download.oracle.com/javase/1.5.0/docs/api/java/lan...
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
You also might need:What every C Programmer should know about undefined behavior #2/3
#define class struct
What every C Programmer should know about undefined behavior #2/3
> #define public private
> to the beginning of my .cc file and include the header file for your class. Then nobody will stop me from doing whatever I want with your private data-and methods-- not the compiler, and certainly not the linker.
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
Actually, given that C and C++ pointers have no fandango-on-core protection, to prove your private members never get edited by non-member non-friend code you have to prove the memory-access correctness of the entire program, including the underlying standard library.
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3
What every C Programmer should know about undefined behavior #2/3