What every C Programmer should know about undefined behavior #2/3 [LWN.net]

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 15:54 UTC (Mon) by gowen (guest, #23914) [Link] (11 responses)

void contains_null_check(int *P) {
  int dead = *P;
  if (P == 0)
    return;
  *P = 4;
}

This one is interesting because (a) something very much like it caused a real security hole in the linux kernel recently and (b) the ONLY reason it exists is because of C's "declarations go at the start of the block" rule.

Someone wants to declare a variable, and knows its good practice to initialise it and, in the interest of style wants to avoid

void contains_null_check(int *P) {
  if (P == 0) return;
  {  
    int dead = *P;
    *P = 4;
  }
}

Result: bug.

C++ (since RAII strongly encourages initialise-at-declaration) and c99 (should it ever catch on) should make this one considerably less common.

What every C Programmer should know about undefined behavior

Posted May 16, 2011 16:37 UTC (Mon) by cfischer (guest, #3983) [Link] (3 responses)

Why not just say:

void contains_null_check(int *P) {
    int dead;
    if (P == 0)
        return;
    dead = *P;
    /* do with 'dead' whatever you want */
}

Wouldn't that be the more 'natural' C style?

Methinks, the

    int dead = *P;

in the example is a misguided result of the urge to 'initialize', as opposed to 'assign'. In C++ those might be different, but in old-style C, for simple datatypes, avoiding the assignment for such reasons could already be seen as bad influence by new-fangled object-oriented hip languages.

So the problem seems more one of mixing styles to me.

What every C Programmer should know about undefined behavior

Posted May 16, 2011 17:43 UTC (Mon) by ledow (guest, #11753) [Link]

I have to agree - that's how I'd write it.

But then, I'd also explicitly use NULL too, rather than 0.

What every C Programmer should know about undefined behavior

Posted May 16, 2011 18:28 UTC (Mon) by iabervon (subscriber, #722) [Link] (1 responses)

I think it's actually a reflex of being used to C99 (or gnu89), primarily. If you can avoid declaring a variable before where you initialize it, there is nowhere that the variable is in scope and uninitialized, meaning you (or someone else touching the code later) can't accidentally use it uninitialized. If you then have to write C89, the easy thing is to move the whole line, instead of splitting it, particularly when you don't think you care about the ordering of this code with respect to other code.

Of course, there's also the situation of a programmer who doesn't think C99 has caught on modifying a C99 codebase, and moving the line with the declaration up to the start of the block; the result may not be "natural" C89 to write, but it might be a "natural" C89 change to a "natural" C99 block.

What every C Programmer should know about undefined behavior

Posted May 16, 2011 19:57 UTC (Mon) by jreiser (subscriber, #11027) [Link]

If you then have to write C89, the easy thing is to move the whole line, ...

Often it is better to make a block by inserting the braces which enclose the intended scope.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 21:35 UTC (Mon) by cmccabe (guest, #60281) [Link] (4 responses)

Ok, I know I'm replying to a blatant troll, but let me just point out that:

1. Initializing something before checking if it's NULL has nothing to do with whether you decide to put declarations at the start.

2. C99 doesn't have a "declarations go at the start of the block rule."

That being said, a lot of people consider it good style in C to put the declarations at the start of the block in C, because it encourages you to keep functions short and sweet.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 5:36 UTC (Tue) by gowen (guest, #23914) [Link] (3 responses)

> Ok, I know I'm replying to a blatant troll

I don't think that word means what you think it means. Clue: It doesn't mean "someone who doesn't share my opinion".

> 1. Initializing something before checking if it's NULL has nothing to do
> with whether you decide to put declarations at the start.

Combined with a coding rule that variables must be initialised when declared, it really kind of does. And thats a common coding rule, because using an unitialised variable is - surprise - undefined behaviour. The vertical space between a variables declaration and its initialisation represent a region in which using that variable is a bug. It is good practice to keep that space as small as possible ("but no smaller" being the extra advice thats forgotten in this case). Yes, its not an unavoidable bug. But its an unnecessary vector for bug transmission.

So, as mentioned above this clash of good advice - or rather, a slightly blind application of usually-good advice - combined with C89's archaism on variable declaration resulted in a bug.

For the recent Linux hole, why do *you* think the (presumably experienced) coder initialised the variable on declaration - which sadly preceded the NULL check?

> 2. C99 doesn't have a "declarations go at the start of the block rule."

I did actually mention that. So zero out of ten for reading the whole post.

What every C Programmer should know about undefined behavior #2/3

Posted Jun 7, 2011 21:14 UTC (Tue) by cmccabe (guest, #60281) [Link] (2 responses)

The reality is that code written in C, C++, and other unsafe languages will always have "holes." We can minimize the holes by careful sandboxing and code inspection, but never quite get them to zero.

Cherry-picking one example of a hole and then using it to justify your style preferences is fairly silly. I could equally well find an overflow caused by signed overflow and say "aha! signed numbers are teh evil."

The reason why I prefer the C89 style of initializing the variables at the top of the block is that it tends to lead to shorter, clearer functions. If you end up with a page worth of declarations, it makes it clear even to the dullest programmer that his function is getting too long. It also makes it clearer how much stack space is actually being used, which is nice when you're really going for performance. And if you're not going for performance, what are you doing using C?

I understand the arguments for the C99/C++ "declare right before use" style. In C++ it's almost a must, because declarations trigger constructors to run, consuming CPU cycles. It also works well with C++'s RAII style. It also can move the definition closer to the use, making it easier to see the type. But again, that assumes you are writing gigantic, multi-page functions, which you *should not do*.

So basically, I think we are going to have to agree to disagree, for C at least. For C++, yes, you should declare as close as possible to where you use a variable.

What every C Programmer should know about undefined behavior #2/3

Posted Jun 8, 2011 13:39 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

If you end up with a page worth of declarations, it makes it clear even to the dullest programmer that his function is getting too long.

You have too much confidence in dull programmers. I have worked on functions with ten pages of variable declarations at the top. (The functions themselves were ten thousand lines long, which *anyone* should have realized was too long, but they had grown slowly to that length and nobody wanted to take the 'risk' of splitting them.)

What every C Programmer should know about undefined behavior #2/3

Posted Jun 16, 2011 0:05 UTC (Thu) by cmccabe (guest, #60281) [Link]

Heh, that sounds like a story from thedailywtf.com

Anyway, lazy or careless people can always find a way to do lazy or careless things. It is nice if you get a helpful hint that what you are doing is wrong, though. For example, using 4 or 8 space indents tends to give you a wakeup call that 20 levels of nesting might be more than the human mind can understand in C or C++. Using 1 or 2 space tabs does not. etc.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 8:40 UTC (Tue) by marcH (subscriber, #57642) [Link] (1 responses)

I think the main issue is having a dead variable in the first place. If "dead" is not dead, then there is no problem, correct?

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 14:55 UTC (Tue) by nye (subscriber, #51576) [Link]

>If "dead" is not dead, then there is no problem, correct?

Incorrect - it's not a result of dead code elimination. This bug arises even if dead is used for something later on, because the initial assignment invokes undefined behaviour in the case that P is NULL. It's therefore always conformant for the compiler to assume that P is *not* NULL and remove the check, since if that assumption is incorrect then the behaviour is undefined and the compiler can do whatever it damn well pleases.

Solution: Don't use C.

Posted May 16, 2011 16:14 UTC (Mon) by benjamingeer (guest, #67678) [Link] (3 responses)

Really. There are so many better languages.

Solution: Don't use C.

Posted May 16, 2011 16:25 UTC (Mon) by felixfix (subscriber, #242) [Link] (1 responses)

There are not very many more naive statements.

Solution: Don't use C.

Posted May 16, 2011 18:34 UTC (Mon) by rpbrennan (guest, #70904) [Link]

Ahhh, the sweet sound of someone being served....

Solution: Don't use C.

Posted May 16, 2011 19:29 UTC (Mon) by renox (guest, #23785) [Link]

Apparently you missed the part of the articles which explains that many of C's undefined behaviours allow the compiler to make better optimisations..

This doesn't mean that one should use C: premature optimisation is still the root of all evil, but just that the reaction "undefined behaviour == don't use C" is naive at best.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 20:12 UTC (Mon) by nix (subscriber, #2304) [Link] (49 responses)

We'll never be free of such problems. After all, we also have no good way to prove that an arbitrary application will halt, or to prove any nontrivial property of it whatsoever (in the general case). We just have to live with this, and find ways to work around it. It does not cause the world to come to an end.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 20:26 UTC (Mon) by jmalcolm (subscriber, #8876) [Link] (6 responses)

Well, I guess that depends what software is in charge of what systems I suppose. :-)

The US Nuclear Arsenal could probably make the world come to an end me thinks. Is it controlled in any way by C (or Unix)? Or is it all in ADA?

This is not a slam on C by the way. I have a deep love of that crazy little language. It also commands enough respect to almost qualify as fear.

Your comment just made me chuckle.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 12:11 UTC (Tue) by dgm (subscriber, #49227) [Link] (5 responses)

C, Unix or Ada, none of them can ensure the system is completely free of errors. Nothing can. Some languages can help reduce "certain" kinds of errors, often trading-off execution speed and/or generality... and introducing subtle, new kinds of errors.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 12:57 UTC (Wed) by marcH (subscriber, #57642) [Link] (4 responses)

> C, Unix or Ada, none of them can ensure the system is completely free of errors.

This is really a pointless comment. Here is a similar one: Nothing can prevent the best burglars to break into your house (so why buy an expensive lock?)

> Some languages can help reduce "certain" kinds of errors, often trading-off execution speed and/or generality...

Yes.

> and introducing subtle, new kinds of errors.

No.

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 15:22 UTC (Thu) by dgm (subscriber, #49227) [Link] (3 responses)

>> C, Unix or Ada, none of them can ensure the system is completely free of errors.

>This is really a pointless comment. Here is a similar one: Nothing can prevent the best burglars to break into your house (so why buy an expensive lock?)

I would not call it pointless, but I agree it's rather trivial. Anyway, it's useful to keep it in mind when listening to vendor's preaching the latest silver bullet.

>> and introducing subtle, new kinds of errors.
>No.

The logical conclusion would be, then, that a "perfect" language that prevents any kind of error is possible, which is absurd.

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 23:45 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

> it's useful to keep it in mind when listening to vendor's preaching the latest silver bullet.

Let's keep the trivial statements coming: every vendor is preaching the latest silver bullet. It's their job, they are paid for it. Their lies does not mean every product sucks.

> The logical conclusion would be, then, that a "perfect" language that prevents any kind of error is possible,

Your logic is really beyond me.

What every C Programmer should know about undefined behavior #2/3

Posted May 20, 2011 3:01 UTC (Fri) by viro (subscriber, #7872) [Link]

> Their lies does not mean every product sucks.

Their lies do not mean that water is wet either...

What every C Programmer should know about undefined behavior #2/3

Posted May 20, 2011 6:10 UTC (Fri) by dark (guest, #8483) [Link]

The logical conclusion would be, then, that a "perfect" language that prevents any kind of error is possible, which is absurd.

That doesn't follow. An alternate conclusion is that even the subtlest errors are already possible in existing languages.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 21:08 UTC (Mon) by HelloWorld (guest, #56129) [Link] (9 responses)

> We'll never be free of such problems. After all, we also have no good way to prove that an arbitrary application will halt
This is, and has always been, a bullshit argument. The fact that we can't write a program that will tell us whether some program will halt doesn't mean that we can't prove it for some specific program we care about.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 1:32 UTC (Tue) by wahern (subscriber, #37304) [Link] (3 responses)

http://klee.llvm.org/

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 5:26 UTC (Tue) by cmccabe (guest, #60281) [Link] (2 responses)

Klee is cool; thanks for sharing that. To be fair, though, it's not a tool for proving programs correct; it's a tool for generating lots and lots of test cases automatically.

There was another tool for Ruby that would randomly alter your code at runtime (!) and see how you handled the resulting errors. I am having a really hard time remembering the name, though...

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 10:26 UTC (Tue) by pager2 (guest, #72197) [Link]

http://ruby.sadi.st/Heckle.html

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 18:35 UTC (Tue) by wahern (subscriber, #37304) [Link]

Correct me if I'm wrong, but symbolic execution engines are what you get when you don't allow the halting problem to dissuade you from analyzing the code, and KLEE is a symbolic execution engine that generates test cases for interesting paths that it has deduced, such as paths to a bug. If a program is small enough KLEE can find many paths which result in the termination of the program, but for large applications KLEE can run for hours or days. In fact, it could run forever, AFAIU, if its heuristics don't terminate the algorithm. This is why it allows you to specify different starting points so you can get complete coverage over more interesting sub-components of an application.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 21:16 UTC (Tue) by kleptog (subscriber, #1183) [Link] (2 responses)

Indeed, the halting problem is (amongst other things) just a formal way of saying you can't write a universal theorem prover. Imagine, you could write a program that terminates on the first odd perfect number, or on an unexpected zero of the Riemann zeta function and your "halting problem solver" would give you the answer to questions mathematicians having been breaking their heads over for generations.

It's not clear that the programs we write are anywhere near that kind of complexity. I feel it should be possible to prove some useful properties of programs but we miss some infrastructure for describing the things we want to prove. You can say "all strings must be UTF-8", but how can you explain that to a computer? There's always exceptions to deal with (the result of read() for example).

It's pointed elsewhere in this thread that want you want is to check the program against a model. The problem is, we (often) don't have the model written in a formal way. So we might need a program that tries to determine the model, which we humans then check and which can then be used to check the program. So you can get messages like: function foo is always called with a UTF-8 string, except for that call over there.

Perhaps this can happen during development, so you get prompted: method foo was always called with object bar, and now also with object baz, correct y/n? Depending on the result the model is updated. The question is, can you make this so it doesn't get in your way too much.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 14:20 UTC (Wed) by jd (guest, #26381) [Link]

I'd probably start with Z or ObjectZ because these are well-studied. Most, if not all, programs that can be written must be translatable into the appropriate Z notation, albeit with unknowable complexity as Z isn't terribly compact.

The reason for using something like Z is that it is implementation-independent, so it doesn't make any difference if you're writing in C, C++, LISP or Prolog. There will be a valid mapping.

The disadvantage of Z is that in the same way the same specification can produce many implementations, the same implementation can produce many specifications. These will not be of equal use and I know of no easy way to generate the specification in a way that guarantees usefulness.

(For those more familiar with Z as the starting point, I'm totally inverting the flow. This may shock some. Sorry. With this scheme, I'm solely concerned with back-engineering what the specification of the code is from the code, not in generating code from a specification.)

The rationale is that Z is intended to be easier to validate than code. Easier, not easy. It's still hard work. But doing the validation (which is the hardest part) in a single frame of reference and converting to it from the different langauges (which is hard but not as bad) should require less work than writing a validator specific to each language.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 20:10 UTC (Wed) by njs (subscriber, #40338) [Link]

> Indeed, the halting problem is (amongst other things) just a formal way of saying you can't write a universal theorem prover. Imagine, you could write a program that terminates on the first odd perfect number, or on an unexpected zero of the Riemann zeta function and your "halting problem solver" would give you the answer to questions mathematicians having been breaking their heads over for generations.

That isn't quite right. You can easily write a program that answers those questions *if they have an answer* -- the halting problem is related to the fact that there might not be an answer at all. It's entirely possible that those particular questions do have answers, though, which case a computer program could find it. But this doesn't help much in practice because that computer program probably won't finish until sometime after the heat-death of the universe.

> You can say "all strings must be UTF-8", but how can you explain that to a computer? There's always exceptions to deal with (the result of read() for example).

Oh, but actually this is easy! Any reasonably competent statically-typed language can do this. E.g., in C++, you define a type "utf8_string", and you make sure that any publicly accessible way of constructing an instance does appropriate validity checking. Then code that assumes valid utf8 just declares that it takes input of type utf8_string. read(), of course, doesn't return a utf8_string, so if you want to read a string and then you want to pass it to some code that assumes utf8 (maybe somewhere else entirely in your program, after its been passed through 10 other functions), the compiler won't let you unless you sanitize it first. And as you refactor your APIs, the conversion naturally gets moved around to be in the right place.

It works great. But I've only seen one project that used C++ like this. It's very sad :-(.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 11:34 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

>This is, and has always been, a bullshit argument. The fact that we can't write a program that will tell us whether some program will halt doesn't mean that we can't prove it for some specific program we care about.

Just because you've proved it in the imaginary world of maths doesn't mean diddley-squat. The ONLY valid proof in the real world is "well, we haven't been wrong yet - but there's always a first time ...".

Cheers,
Wol

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 14:22 UTC (Wed) by jd (guest, #26381) [Link]

Well, all programs are equal to a set of mathematical expressions. That can be proven and may well form a part of the battle against software patents since you can't patent mathematical expressions.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 21:10 UTC (Mon) by njs (subscriber, #40338) [Link] (24 responses)

It's not clear that we couldn't prove arbitrary non-trivial properties (including halting) about *any application we actually care about*, though.
For example, type systems (if properly abused) let you prove all sorts of interesting properties about global data flow (e.g., the property "no string ever goes from the user to the database without being sanitized"). There's no good reason why the behavior of your X server, kernel, web server, browser, etc. should ever be *uncomputable*.

Large-scale proofs about programs are very hard, but that's a tools problem, not a deep conceptual abyss where no-one should ever even try to tread. (And we'd probably have better tools -- and programming languages -- if people were less scared of the abyss.)

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 21:50 UTC (Mon) by cmccabe (guest, #60281) [Link] (23 responses)

Programming language researchers have been jumping into the "abyss" for decades now, and nobody has heard of them since. The most popular languages these days were started as hobby projects by professional developers (Ruby, Python, Perl) or funded by big companies that wanted to build out a platform (Java, C#).

In my humble (non-tenured) opinion, proofs belong in math class; model checking belongs in engineering. If you tried to prove any non-trivial thing about your life, you would quickly find it impossible.

Imagine trying to prove that you were going to go to the store and return with a gallon of milk. It sounds simple, but consider: how can you prove that you won't have a car crash or a heart attack? Even if none of those disasters happen, the store might be closed. The kind of milk you want might not be in stock. The traffic might be so heavy that it takes you an hour to drive there. If you can only get skim milk, does that still count as picking up the milk? If traffic takes an hour and the milk is warm when you get back, does that count as success?

Similarly, if you try to prove that your browser will successfully display a web page, you slam head-on into a mountain of difficulties. What if the RAM is bad? How about the hard disk? Will the network lose some packets? What if the web page is so complex that it takes 10 minutes to render on our puny CPU? What about the libraries I depend on? Is there a bug in there? What if the web page is rendered in an "ugly" way (a subjective term). Can I prove that that won't happen? Of course not.

As Einstein said: "As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality."

The best I can do is set up a bunch of models and validate my program against them. Static type checking is one such model. Unit tests are another set of models. Tools like lint, sparse, cppcheck provide yet more tests. Another set of tests is giving the program to users and seeing if they like the responsiveness, the user interface, and the overall design.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 22:17 UTC (Mon) by HelloWorld (guest, #56129) [Link]

> Similarly, if you try to prove that your browser will successfully display a web page, you slam head-on into a mountain of difficulties.
Of course it's not currently feasible to prove the correctness of a complex program like a web browser, already because there is no formal mathematical specification for how HTML documents should be rendered. But it'd already be a success if certain properties of a program could be proven automatically. Properties such as "every file descriptor opened is also closed at some point" or "every array index is within the bounds of the array".

> What if the RAM is bad? How about the hard disk?
Bad RAM is not a software bug, thus it's out of scope for software developers. It's like saying that you can't prove anything in math because your axiom or logic may be "broken". You just have to assume something. In Peano numerals, it's the existence of the number 0, and if you want to prove stuff about programs, you have to assume that the machine you run it on works.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 22:33 UTC (Mon) by njs (subscriber, #40338) [Link] (13 responses)

> In my humble (non-tenured) opinion, proofs belong in math class; model checking belongs in engineering.

In my experience, careful proofs can form a critical part of complex system design. I'm thinking of, for example, the problem of a VCS trying to merge a DAG of changes, each of which may contain arbitrary file adds/deletes/renames. This is a horrible problem with many incorrect solutions littering history, but, solvable with careful formal reasoning.

> Similarly, if you try to prove that your browser will successfully display a web page, you slam head-on into a mountain of difficulties. What if the RAM is bad? How about the hard disk? Will the network lose some packets? What if the web page is so complex that it takes 10 minutes to render on our puny CPU? What about the libraries I depend on? Is there a bug in there? What if the web page is rendered in an "ugly" way (a subjective term). Can I prove that that won't happen? Of course not.

Yes, but that just means you tried to prove the wrong thing.

Can that web page trigger writes to arbitrary pages on my hard disk? Is there anywhere in my program that uses latin1 when it should be using UTF-8? Will this image decoder return either valid data or a defined error code on arbitrary inputs? Does this program invoke undefined behavior? Can this data be corrupted if certain code is scheduled concurrently? Proving those kinds of properties is totally useful, possible in principle (if you set things up right), and in some cases easily doable today.

> The best I can do is set up a bunch of models and validate my program against them. Static type checking is one such model.

But static type checking *is* a way to prove global properties of your program! Or another example: C++'s "private:" keyword is useful not because it gives some kind of 'security' or something (like many documents about it seem to think), but because it lets me know for certain (i.e., prove) that I only have to look at a certain bounded amount of code if I want to see how certain variables are modified. (And then that in turn is useful because it lets me verify that all that code maintains the relevant invariants, which is another kind of informal proof.)

I just want better tools for *non-heuristic* reasoning about programs. That's really not impossible -- though it might require changes to everything from the programming languages to your program's architecture -- and it would be useful to people without tenure, too.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 22:56 UTC (Mon) by cmccabe (guest, #60281) [Link] (11 responses)

> Or another example: C++'s "private:" keyword is useful not because it
> gives some kind of 'security' or something (like many documents about it
> seem to think), but because it lets me know for certain (i.e., prove) that
> I only have to look at a certain bounded amount of code if I want to see
> how certain variables are modified

Actually, the "private" keyword in C++ allows you to prove no such thing. I can simply typecast the class to a byte buffer and modify to my heart's content.

If I want to be even more evil, I can add:
#define public private
to the beginning of my .cc file and include the header file for your class. Then nobody will stop me from doing whatever I want with your private data-and methods-- not the compiler, and certainly not the linker.

Java also has a way for "unauthorized" classes to get access to private data. I think you can use the Reflect package to get at it.

Private data in Java and C++ is not and was not intended to provide the kind of guarantees a proof checker would need.

On the other hand, if the goal is to allow the programmer to have a reasonable mental model, they work pretty well!

> I just want better tools for *non-heuristic* reasoning about programs.

Too bad. Artificial intelligence is about at where chemistry was in 1000 AD. Theorem provers are good for proving theorems, but bad at real-world reasoning.

On the other hand, what I can offer you is sandboxed programming langauges and model checkers. It's amazing how much more productive you can be when you have a garbage colector and a good type system.

C.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 22:58 UTC (Mon) by cmccabe (guest, #60281) [Link] (7 responses)

That should have been
#define private public
#define protected public

:)

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 23:12 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (5 responses)

I think a compliant JVM's byte code verifier + security policy ensures that you can't use Reflection to poke the private members of somebody else's code. I think you'll get a SecurityException at the line where you attempt to violate policy if you go about it in the natural way. And if you try to write raw byte code to sidestep, the verifier will reject your .class as invalid.

Of course in a debug mode, or in a non-compliant JVM, or if there's a bug, you may make this work. But in /theory/ at least they've thought of this, so it would be fair for a Java programmer (unlike a C++ programmer) to treat private members as genuinely private.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 23:30 UTC (Mon) by cmccabe (guest, #60281) [Link] (4 responses)

I can get the private data of any class that implements Serializable pretty easily. I can always get access to protected data members by defining my own subclass (unless the class is final, but putting protected data members in a final class is an odd choice).

I have a pretty good hunch that this API gives me a hole in "private" big enough to drive my truck through.
http://download.oracle.com/javase/1.5.0/docs/api/java/lan...

I haven't tried it, though.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 1:07 UTC (Tue) by foom (subscriber, #14868) [Link]

If there's no security manager policy in the way, you certainly can access private members with the obvious reflection APIs. If there is a security manager policy, it also prevents you from doing the other Evil Things that would let you access the private members.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 8:54 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)

Not only might a class choose not to implement Serializable if it doesn't want you looking at its internals, it might also choose to implement Serializable in a way that's completely unhelpful while obeying the spec. In some cases this might be the most natural approach e.g. their magic internal values could change in a future otherwise compatible replacement, so they reconstruct them when de-serialising, rather than including them in the serialised output.

In other cases it would be very deliberate e.g. we can imagine a system where untrusted code runs in a Java sandbox on a remote system, and has access to certain serialisable objects which are sensitive, so their serialisations are encrypted, versioned and signed with keys not available to the untrusted code.

Even if you're allowed to make the subclass (security policy again) - your subclass doesn't get to look at protected data members from other instances, so this will often be useless. Remember this is Java, so type restrictions are enforced at runtime.

Basically this goes on and on, unlike in C++ the designers actually intended this to be enforced, not just a vague guideline to help those willing to help themselves. So even if you find a crack in the wall, someone will fix it. There really aren't any gaps "big enough to drive a truck through" as you imagine and as is the case in something like C++. If you want to drive a truck in, you need someone to conveniently open the truck-sized gate from the other side by disabling the relevant security policy.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 9:01 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

"your subclass doesn't get to look at protected data members from other instances"

Actually this bit might be wrong. You could be able to just pass in a suitable instance and have the code inside your imposter subclass poke around in its protected internals. But again the security policy gets to decide whether you're allowed to make this subclass at all (unlike the 'final' keyword this places no such restriction on the author of the rest of the system who may very well operate under a different policy).

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 17:21 UTC (Tue) by jeremiah (subscriber, #1221) [Link]

Just as an FYI the easiest way to get access is to use the reflection API, and the setAccessible() method. Things get a little deep in the reflection stuff, but being able to gain access to anything and modify comes in handy sometimes. I generally use it when writing serialization libraries.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 8:39 UTC (Tue) by chad.netzer (subscriber, #4257) [Link]

You also might need:

#define class struct

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 1:28 UTC (Tue) by foom (subscriber, #14868) [Link]

> If I want to be even more evil, I can add:
> #define public private
> to the beginning of my .cc file and include the header file for your class. Then nobody will stop me from doing whatever I want with your private data-and methods-- not the compiler, and certainly not the linker.

The compiler/linker will prevent that if you're using MS VC++: on that platform, the mangling for public and private methods is different! Forces you to be more evil than just #define to get your way. :)

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 14:35 UTC (Tue) by dgm (subscriber, #49227) [Link]

> It's amazing how much more productive you can be when you have a garbage colector and a good type system.

<flame>Oh, yes! You can churn buggy *and s... l... o... w* code much faster (and in greater quantities!) with that. The new guy in the corner just does this all day long. Thanks C#!</flame>

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 23:57 UTC (Thu) by marcH (subscriber, #57642) [Link]

> I can simply typecast the class to a byte buffer and modify to my heart's content. If I want to be even more evil, I can...

To demonstrate that code verification tools do not work, you hit them with a sledgehammer. Interesting and conclusive.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 22:57 UTC (Mon) by mpr22 (subscriber, #60784) [Link]

Actually, given that C and C++ pointers have no fandango-on-core protection, to prove your private members never get edited by non-member non-friend code you have to prove the memory-access correctness of the entire program, including the underlying standard library.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 9:00 UTC (Tue) by marcH (subscriber, #57642) [Link] (7 responses)

> ... and nobody has heard of them since.

Such a statement requires omniscience.

I am afraid you are not omniscient

http://en.wikipedia.org/wiki/Polyspace (just a random example)

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 22:38 UTC (Tue) by cmccabe (guest, #60281) [Link] (6 responses)

I'm not against the use of little proof checkers when writing code. Arguably, static type systems are just examples of little proof checkers. Another example is the lexing and parsing step where the compiler decides whether your program is syntactically correct. You can argue that while parsing the program, it proves that the program is parsable.

Polyspace looks like another such tool. It can prove that certain programs clearly violate C semantics, like dereferencing a NULL pointer. But it can't prove the program as a whole correct because it doesn't have enough information about the requirements and the environment.

There are other examples of little model checkers. Sparse allows kernel hackers to annotate functions with interesting properties like what locks it takes, etc. cpplint finds potential errors in C++ programs. And -Wall and -Wextra add even more checks. Those things are great and we should have more of them.

But if you did a survey of most academic programming language departments, you would find that most of them focus on developing completely new programming languages and rewriting things from scratch. Nothing is more common than a grad student inventing a new functional programming language for his thesis. Nothing is more uncommon than an actual useful tool coming out of it. Even your own example confirms this: Polyspace is commercial and not developed in academia.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 0:34 UTC (Wed) by price (guest, #59790) [Link] (5 responses)

> Polyspace is commercial and not developed in academia.

Wrong. It is commercial, but it came from academia.

Quick history of Polyspace, unfortunately quoting an obituary (http://christele.faure.pagesperso-orange.fr/AlainDeutsch.html - see original for many links):

1992-1993: Alain obtained his PhD on static analysis by abstract interpretation for dynamic data structures with pointers. He made a post-doc at the University of Carnegie-Mellon on the static analysis of communication protocols.
1993-1996: Alain joined PARA/INRIA Rocquencourt. He focussed on the static analysis of programs and developed the IABC analyser based on abstract interpretation.
1996-1999: He specialized his IABC analyzer to runtime error detection in Ada source code and proved the effectiveness of static analysis by exhibiting code errors in the flight programme of Ariane 5 incriminated in the flight 501 crash. Alain was a member of the qualification board of flight 502, which gave the green light for the second (successful) launch of Ariane 5. ...
1999-2006: He founded PolySpace Technologies together with Daniel Pilaud and became the Technical Director (Chief Technical Officer). PolySpace Technologies industrialized his IABC analyzer for Ada source code, extended it to C C++ Ada95 and commercialized PolySpace Verifier for Ada83, C, C++, Ada95.

So the tool was developed by a PhD researcher working at a research lab (and well-known hotbed of static type systems), refined for several years, and finally commercialized by him and others.

Another static analysis tool widely used in practice is Coverity -- that one came from a bunch of academics at Stanford. That's the usual story for this kind of tool.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 4:08 UTC (Wed) by cmccabe (guest, #60281) [Link] (4 responses)

I stand corrected. It looks like Polyspace came mostly from academia.

It's also interesting that it was associated with the launch of a European rocket, Ariane 5. Everyone knows that the space race in the 1960s helped to push technology ahead in the United States; I guess that still happens, at least to a certain extent.

I think this kind of research is really interesting. It seems like it could help create more efficient compilers and perhaps better tools.

For what it's worth, I like static type systems. I really hope that in the future, we'll be able to have more and more information about programs even before running them. Programmers ought to be free from the drudgery of spotting typos or passing the wrong arguments to functions. Or even accidentally dereferencing a pointer before assigning it. As I said before, though, there are always things in the design that are impossible to "prove" (in the mathematical sense), like user interface, the performance of heuristic algorithms, and artistic design.

What every C Programmer should know about undefined behavior #2/3

Posted May 18, 2011 5:02 UTC (Wed) by price (guest, #59790) [Link] (3 responses)

Huh, nice observation re: this technology being connected to the space race. I hadn't made that connection.

Definitely beats Tang, if you ask me.

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 3:53 UTC (Thu) by raven667 (subscriber, #5198) [Link] (2 responses)

If Feynman is to be believed then a significant part of modern technological living was developed at least in part as part of the space race and this is evidence that this continues to be true at a smaller scale.

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 13:11 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

Do not forget World War II just before that. In some cases the same people involved http://en.wikipedia.org/wiki/Wernher_von_Braun

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 18:01 UTC (Thu) by raven667 (subscriber, #5198) [Link]

well that part of the space race was about who captured the better germans at the end.

What every C Programmer should know about undefined behavior #2/3

Posted May 16, 2011 23:08 UTC (Mon) by salvarsan (guest, #18257) [Link]

+1.

C is a well-defined devil, and the work-arounds are, too.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 8:38 UTC (Tue) by stijn (subscriber, #570) [Link] (5 responses)

Also, a buffer overrun in any other language will generally imply the application is buggy and will yield at the very least corrupt results, and similarly for NULL pointer dereferencing and other off-the rails events. C might just blow up a bit harder, and unfortunately make a bigger hole as a result. There are plenty of remedies outside the C language; there's different defense vectors just as there are different attack vectors. The worst and weakest C wart IMHO are \0-terminated strings. As a result we get duplicate mem and str libraries, already a good indication something is amiss. Reasoning about \0-terminated strings is hard, not because of buffer sizes, but because you always have to make sure that no stowaway \0 can possibly be present inside your string.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 15:07 UTC (Tue) by dgm (subscriber, #49227) [Link] (4 responses)

> C might just blow up a bit harder, and unfortunately make a bigger hole as a result.

I don't see why. What's the magic that makes C "blow harder" than any other language where a NULL pointer dereference is possible (assembler or Pascal, for example)?

> As a result we get duplicate mem and str libraries, already a good indication something is amiss.

Nonsense. Pascal, for instance, does not use zero-terminated string (uses size-prefixed strings), neither do other languages like C++, C# or Java. And all of them have separate facilities for dealing with strings and arbitrary buffers.

> Reasoning about \0-terminated strings is hard, not because of buffer sizes, but because you always have to make sure that no stowaway \0 can possibly be present inside your string.

"Hard" is relative. I have foggy memories of having had some problem with an embedded zero in an string on my first week of writing C, like 20 years ago, but never after that.

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 19:06 UTC (Tue) by stijn (subscriber, #570) [Link] (3 responses)

Given that I am stuck where you were 20 years ago, you are bound to be right - thanks, that clarified matters greatly!

What every C Programmer should know about undefined behavior #2/3

Posted May 17, 2011 21:57 UTC (Tue) by baldridgeec (guest, #55283) [Link]

For your last point, basically it comes down to "if there might be a \0 in valid input for this function, even though it may be a byte[], it is not a string."

So you use memchr() instead of strchr().

Or you use C++, and call it a string instead of a byte[]. (and use the actual C++ string-manipulation functions, not strchr - g++ warns about (byte[])string as being an invalid cast nowadays anyway though) :)

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 14:49 UTC (Thu) by dgm (subscriber, #49227) [Link] (1 responses)

Your problem seems to be that you want to use the C string library to handle arrays of arbitrary bytes. I will not work, it's not what it was designed to do.

What every C Programmer should know about undefined behavior #2/3

Posted May 19, 2011 15:49 UTC (Thu) by stijn (subscriber, #570) [Link]

Just to clarify, I can deal with both the string library and the mem functions. When I come to LWN, I hope to engage in constructive conversation, not chest-beating.