|
|
Log in / Subscribe / Register

Python cryptography, Rust, and Gentoo

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 10:55 UTC (Fri) by khim (subscriber, #9252)
In reply to: Python cryptography, Rust, and Gentoo by Wol
Parent article: Python cryptography, Rust, and Gentoo

> undefined behaviour is whatever the hardware does

If that is the definition behavior then what the heck is implementation-defined behavior?

No, the confusion is much deeper. “Undefined behavior” always meant what it means today. And, in fact, most types of undefined behavior don't cause any confusion. Attempts to read from pointer after calling free or reading from undefined variable rarely cause confusion.

Something like this:

int foo() {
  int i;
  i = 42;
}

int bar() {
  int i;
  return i;
}

int main() {
  foo();
  printf("%d\n", bar());
}

Should code like above work or not? Clang breaks it even when compiled with -O0 (but gcc with -O0 works, although any other optimization level breaks it).

I don't know any practicing programmer who says compilers should support code like the above example.

Tragedy happened when decisions of C standards committee clashed with developer's expectations. Because C was designed to create portable programs lots of things which are, actually, well-defined (yet different!) on many platforms were put into “undefined behavior, don't use” bucket (instead of “implementation-defined, use carefully” bucket).

The intention was, of course, to make programs portable, but completely different things happened instead: so many “implementation-defined, use carefully” things were marked as “undefined behavior”s that developers started thinking that “undefined behavior” means precisely that: whatever the hardware does.

And now we have all that mess.

But no, “undefined behavior” never meant whatever the hardware does. Not even in C89. It was always “something your program should never do.”


to post comments

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 12:52 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

I think that mistake proves my point ... :-)

Undefined, implementation dependent, whatever. The point is, it BREAKS THE PROGRAMMER'S MENTAL MODEL.

And however much you want to blame the programmer, if programmers keep on doing it, it's a design fault ...

Cheers,
Wol

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 17:38 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

> And however much you want to blame the programmer, if programmers keep on doing it, it's a design fault ...

Got it. So we have issues with C which even Rust doesn't fully address:

— if you put check outside of loop the it would't test all elements of array.

— if you initialize your variable after it's used then program doesn't work.

— if you change the variable then other variables (which were calculated on basis on that variable) don't change as they should.

— you need to actually allocate memory for your data structure, just declaring pointer doesn't mean you can use these.

And I can probably add dozens more.

</sarcasm off>.

Granted: these are expectations of people who have started studying programming about two month ago… but they are very-very common.

Should we do something about them? If yes then what… if no, then why the heck no?…

> The point is, it BREAKS THE PROGRAMMER'S MENTAL MODEL.

Sure — but pretty much anything can break it if programmer is not taught properly.

The C (and C++) suffer mostly from Hyrum's Law: many thing which were supposed not to work… actually work — with real-world compiler. And then, later… they stop (even if documentation always warned not to use them)… that is when trouble happens (think glibc story).

That's the only problem with C/C++… but it's pretty severe: C language on paper and C language as implemented by typical compiler were different for so long that it's unclear what can be done at this point.

The thing is: I'm not sure switching to Rust (or any other language) would save us. After 10-20-30 years they would be in the same situation, too.

I'm not even really sure what can be done about it. Have just one fixed compiler without any changes? I don't think it would really work.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 22:20 UTC (Fri) by roc (subscriber, #30627) [Link]

> I'm not sure switching to Rust (or any other language) would save us. After 10-20-30 years they would be in the same situation, too.

No they won't.

Rust is designed to eliminate "undefined" or "implementation defined" behavior outside of explicit "unsafe" blocks. Yes, there will be compiler bugs etc, but really there will be vastly less of such problematic behaviors in Rust programs than in C and C++ programs.

That means we can expect Rust programs to behave much more consistently over time than C/C++ programs, as hardware and compilers evolve.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 18:13 UTC (Fri) by anselm (subscriber, #2796) [Link] (1 responses)

Tragedy happened when decisions of C standards committee clashed with developer's expectations. Because C was designed to create portable programs lots of things which are, actually, well-defined (yet different!) on many platforms were put into “undefined behavior, don't use” bucket (instead of “implementation-defined, use carefully” bucket).

AFAIR, the C89 standard carefully distinguished between “undefined” and “implementation-defined” behaviour. “Implementation-defined” behaviour is very emphatically not “undefined” behaviour, it's just that it is not defined by the language standard but by the various implementations (or their underlying platforms).

For example, the result of the >> operator applied to a negative signed integer is implementation-defined – many platforms offer a choice between arithmetical and logical right-shift and the compiler writer needs to pick one of the two, but after that, that particular compiler on that platform will always do it that way. (The reason why this particular behaviour was declared implementation-defined is probably that Ritchie didn't stipulate what was desired and by the late 1980s there were enough C implementations doing it one way or the other that nobody could agree anymore on which way was “correct” without making the other half of the industry “wrong”, and breaking programs that relied on the other behaviour.)

With appropriate care, you can exploit implementation-defined behaviour – especially if your set of implementations is small –, but with undefined behaviour, all bets are off. If you're interested in C code that is maximally portable between implementations, implementation-defined behaviour is, of course, something to avoid, but again it is a good idea to flag it as such in the standard so people can be aware of it.

Python cryptography, Rust, and Gentoo

Posted Feb 12, 2021 20:31 UTC (Fri) by khim (subscriber, #9252) [Link]

> AFAIR, the C89 standard carefully distinguished between “undefined” and “implementation-defined” behaviour.

Yes, but that wasn't my point.

You have explained perfectly why right shift of the negative value is “implementation-defined” behavior. All is very logical and proper.

But what about shift by negative value? Many (most?) low-level programmers expect that this would be “implementation-defined”, too. After all most CPUs do something predictable when they get negative value as shift value (different ones do different things but all CPUs I know do something predictable). More-or-less same as with shift of negative value: there may be different outcomes on different CPUs, yet there would be some outcome, right?

Well… no.

If you would actually open C89 standard you would see that “the result of a right shift of a negative-valued signed integral type (6.3.7)” is listed in “Appendix G, part 3 Implementation-defined behavior”… yet “an expression is shifted by a negative number or by an amount greater than or equal to the width in bits of the expression being shifted (6.3.7)” is not in part 3… it's in “Appendix G, part 2 Undefined behavior”!

I would love to know why that difference is there? Do some CPUs lock up when faced with negative shift? Or does something crazy happens (like: it takes so long that DRAM starts losing it's contents)? Or maybe some compiler couldn't handle it? Or… maybe committee just decided that if they would declare it “undefined behavior” then people would stop using it and compiler writers can generate better code?

I have no idea, really. But the end result: -1 >> 1 is “implementation-defined behavior” yet 1 >> -1 is “undefined behavior”.

To most low-level guys this is sheer insanity… yet that's how C89 is defined.

> If you're interested in C code that is maximally portable between implementations, implementation-defined behaviour is, of course, something to avoid, but again it is a good idea to flag it as such in the standard so people can be aware of it.

It's actually done in exactly this way. Not only C standard distinguishes “unspecified behavior”, “implementation-defined behavior”, and “undefined behavior”. It actually have all of them listed in three appendixes! To make sure noone would mix them up.

The only problem: actual programmers don't consult these when they are writing code. They try to guess. Based on their mental model. And for most programmers mental model either says that you could't shift negative value and you couldn't shift by negative value, too (these are sorta-lucky ones: they may not write fastest code, yet they tend to write correct code) or, alternatively, they assume you can push anything you want into a shift and get something back… and then they write something like (a >> (i-1)) * i with comment /* if i == 0 then result is zero and we don't care what a >> (i-1) produces */… only then modern compiler “looks” on that, notices that i couldn't ever be zero (because this would lead to undefined behavior) and happily nukes check if (i == 0) and removes “dead code”.

And that is where shouting starts. C89 standard clearly says that “undefined behavior” could lead to anything at all… yet “advanced programmers” say that “removing code which I specifically wrote there to catch errors is not anything at all in my book”… hilarity ensues.

P.S. I wonder if people who developed C89 are still alive and can say what they think about all that… does anyone know?


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds