Modern C for Fedora (and the world)

By Jonathan Corbet
December 8, 2023

It can be instructive to pull down the dog-eared copy of the first edition of The C Programming Language that many of us still have on our bookshelves; the language has changed considerably since that book was published. Many "features" of early C have been left behind, usually for good reasons, but there is still a lot of code in the wild that is still using those features. A concerted effort is being made in both the Fedora and GCC communities to fix that old code and enable some new errors in the GCC 14 release (which is in stage 3 of its development cycle and likely to be released by mid-2024), but a fair amount of work remains to be done.

There are a number of constructs that were normal in 1980s C, but which are seen as a breeding ground for bugs now. These include:

Implicit function declarations: if code calls function() without having first included a declaration for that function, the compiler implicitly declares it as taking no parameters and returning an int value. That may not be how the function is actually defined, opening up possibilities for all kinds of confusion.
Implicit integer declarations: a variable declared with just a storage class (static, for example) is implicitly deemed to be an int. C++ has already adopted type inference, where the compiler figures out what the appropriate type for the variable should be from how it is used, in this case. There are schemes afoot to add a similar feature to C, but type inference is incompatible with implicit int.
Conversions between pointers and integers: original C played fast and loose with pointer values, allowing them to be converted to and from int values at will. Whether such constructions actually work on current architectures (where a pointer is likely to be 64 bits and an int 32 bits) is a matter of chance.
Inconsistent return statements: old-style C paid little attention to whether a function returned a value or not; a function declared int could do a bare return (or just fall off the end with no return statement at all), and void functions could attempt to return a value without complaint. Good things will not happen if a function fails to return a value that the caller is expecting.
Missing parameter types in function definitions: C would accept such definitions, assigning no type to the parameter at all. That means that typos in a function prototype (such as "f(ant)" instead of "f(int)") can give surprising results.
Assignments between incompatible pointer types: continuing in the "fast and loose with pointers" theme, early C had no objections to assigning a pointer value to an incompatible type without even a cast. Sometimes a developer writing such an assignment knew what they were doing; other times not.

Current GCC versions will issue warnings for the above constructs, but will proceed to compile the code anyway. Florian Weimer, though, would like to change that situation; at the end of November, he posted an update on work toward turning the warnings for obsolete C constructs into hard errors instead. This would seem like a sensible thing to do; those constructs have been deprecated for ages; they can hide bugs or prevent the adoption of new language features and should not be appearing in modern code.

There is only one little problem: a lot of code in the free-software world is not modern. Simply turning all of those warnings into errors has the potential to break the compilation of numerous packages — an outcome that is not likely to be universally welcomed. To address this problem, the Fedora project has been working on a "porting to modern C" project since at least late 2022. The idea is to find the packages in Fedora that fail to build with the new errors and fix them, sending those fixes upstream whenever possible. Once Fedora builds correctly, chances are that the amount of old code that remains will be relatively small.

Weimer has also posted an update on the Fedora work. There are, it seems, still a number of packages (out of about 15,000 tested) that generate errors indicating the presence of old code:

Implicit function definition 53

Implicit integer declaration 2

Integer conversion 99

Return mismatch 13

Missing parameter type 0

Pointer assignment 374

While quite a bit of progress has been made toward the goal of building Fedora with the new errors, Weimer points out that the job is not yet done:

As you can see, the incompatible-pointer-types issues are a bit of a problem. We fixed over 800 packages during the first round, and now it looks like we are only two thirds done.
It is unlikely that I will be able to work on all these issues myself or with help from the people around me. I just suggested to GCC upstream that we may have to reconsider including this change in the GCC 14 release.

Weimer included a separate column for programs that may be miscompiled because autoconf may be confused by the new errors. For example, many of its checks don't bother to declare exit(); they will fail to compile if the error for implicit function definitions is enabled, causing autoconf to conclude that the feature it is checking for is absent. There are also seemingly problems with the Vala language, which compiles to obsolete C. Vala ~~has not been under active development~~ has not addressed this problem for some time and seems unlikely to be fixed.

The current plan is to continue this work, focusing mostly on the Fedora Rawhide development distribution. Efforts will be made to deal with the autoconf problem and to put some sort of hack into Vala, but that still leaves hundreds of packages needing further attention. If they cannot be fixed in time, it may not be possible to enable all of those errors in the GCC 14 release.

Part of the problem, perhaps, is that it appears to have fallen on Fedora and GCC developers to make these fixes. In many cases, this may be the result of the lack of a viable upstream for many packages; we are probably all using more unmaintained code than we like to think. At its best, this work might shine a light on some of those packages and, in a truly optimistic world, bring out developers who can pick up that maintenance and modernize the code. In many cases, it should be a relatively straightforward task and a reasonable entry point into maintainership. With enough help, perhaps we can finally leave archaic C code behind.

Modern C for Fedora (and the world)

Posted Dec 8, 2023 17:08 UTC (Fri) by willy (subscriber, #9762) [Link] (61 responses)

I'm surprised that any code using implicit-function-declaration survived the move to 64-bit. I thought that was more or less guaranteed to be fatal. Maybe only on some architectures.

Modern C for Fedora (and the world)

Posted Dec 8, 2023 17:21 UTC (Fri) by cjwatson (subscriber, #7322) [Link] (2 responses)

It was usually fatal on ia64, but crucially, it often wasn't fatal on amd64. I remember we implemented a special wrapper for Ubuntu builds to grep the build log for such warnings and forcibly fail the build if we spotted them ...

Modern C for Fedora (and the world)

Posted Dec 9, 2023 10:59 UTC (Sat) by fw (subscriber, #26023) [Link] (1 responses)

Even more surprising is int-conversion, which is introducing even more pointer clipping to 32 bits.

But it turns out that on x86-64, without PIE, global data, constants, and the heap all are in the first 32 bits of the address space. Even today, only the stack is outside that range. So you get surprisingly far with 32-bit pointers only. It really shouldn't work, but it does in many cases. But of course PIE changes that.

Modern C for Fedora (and the world)

Posted Dec 16, 2023 8:16 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

> the heap

surely that depends on the size of your heap

Modern C for Fedora (and the world)

Posted Dec 8, 2023 17:32 UTC (Fri) by Paf (subscriber, #91811) [Link] (28 responses)

I don’t think so - very few architectures have 64 bit ‘int’. Like, hardly any. Int is largely stuck as a 32 bit type.

So the assumption that made most of that code work didn’t become wrong with the advent of 64 bit.

Fwiw, I work in a project that has long done Wall and Werror and so all of these constructs terrify me :)

Modern C for Fedora (and the world)

Posted Dec 8, 2023 19:34 UTC (Fri) by roc (subscriber, #30627) [Link] (27 responses)

Enabling -Wall and -Werror by default is problematic because it means your code breaks every time a compiler introduces a new warning under -Wall.

Though, maybe compilers have stopped adding warnings to -Wall and now only add to -Wextra instead? I wish I knew.

Modern C for Fedora (and the world)

Posted Dec 8, 2023 22:23 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (2 responses)

> Enabling -Wall and -Werror by default is problematic because it means your code breaks every time a compiler introduces a new warning under -Wall.

Eh, it depends what you want to get out of Wall/Werror. If you're a distro, of course you don't want to use it, it will break all the packages all the time. If you're an upstream, and you also require zero lint errors (for whatever linter your project is using), then this is much less problematic. By the time something makes it into -Wall, the linters have probably been complaining about it for years, and so in practice, the amount of breakage when you upgrade to a new compiler is rather limited. And you always have the option of (temporarily) doing -Wall -Wno-foo if a particular warning causes issues.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 7:56 UTC (Sat) by wtarreau (subscriber, #51152) [Link]

That's what we do on haproxy: we build with -Wall -Wextra everywhere, and in addition to this, developers and CI have -Werror enabled. Due to the diversity of distros used by developers (and the CI) we generally manage to make sure no warnings are left for distros at release time.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 11:09 UTC (Sat) by Sesse (subscriber, #53779) [Link]

But upstream code tends to move into distros. :-) So shipping upstream with -Werror is pretty unhelpful. But running it _while developing_ is great, at least as long as you don't need to support like eight different obsolete C compilers with weird and different warning sets.

Modern C for Fedora (and the world)

Posted Dec 8, 2023 22:40 UTC (Fri) by fwiesweg (guest, #116364) [Link] (19 responses)

That probably depends very much on your codebase. If you are badly understaffed for the age and amount of your code, yes, probably stay away from it. That's about where I started with my projects a decade ago.

On the other hand, if you are able to keep up with the load, it's about the best thing you can do. With each modernization push, enforced by making warnings fail hard, the amount of runtime errors can be brought down considerably, and by now nearly all issues we have are caused by missing or disabled static check.

Of course, updating was gruesome, tedious work, but it makes the life after much more relaxed and enjoyable. I even ran a Friday deployment today without being overly worried, something I'd never have done just five years ago. In then long-run, -Wall was really worth it.

Modern C for Fedora (and the world)

Posted Dec 8, 2023 23:45 UTC (Fri) by roc (subscriber, #30627) [Link] (15 responses)

It's OK for *developers* to hit these errors, but it's broken for random users who just want to build upstream with a compiler that's newer than what the developers are using.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 0:30 UTC (Sat) by pbonzini (subscriber, #60935) [Link] (13 responses)

Just make sure that -Werror is easy to disable and enabled only when building from a VCS checkout, or something like that.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 1:02 UTC (Sat) by roc (subscriber, #30627) [Link] (6 responses)

People building from upstream are typically building from a VCS checkout so that doesn't help.

Currently we build with -Werror -Wall for CMAKE_BUILD_TYPE=DEBUG, and not for CMAKE_BUILD_TYPE=RELEASE. That's assuming developers build regularly with DEBUG and people who just want a working upstream build don't. It works out OK in practice. It doesn't seem ideal but maybe it's about as good as it can be.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 8:15 UTC (Sat) by pm215 (subscriber, #98099) [Link] (4 responses)

The difficulty with doing it based on debug versus release is that often you want your debug build to be -O0 because the debugging experience is so much nicer, but that also turns off a lot of the data flow analysis that is needed for some of the warning categories.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 11:11 UTC (Sat) by smcv (subscriber, #53363) [Link] (3 responses)

Is gcc -Og ("turn on optimizations that don't hurt debugging too much") a reasonable compromise for debug mode?

Modern C for Fedora (and the world)

Posted Dec 9, 2023 16:41 UTC (Sat) by pm215 (subscriber, #98099) [Link] (2 responses)

It's supposed to be, but in practice I have found it is not, which makes it pretty useless. I tried -Og, got burned (by gdb reporting it could not tell me the values of variables in my program because they had been "optimized away") and went back to -O0.

Unless the compiler authors commit to "-Og will never lose debug info that is present in -O0" I personally use and advise others to use -O0.

Modern C for Fedora (and the world)

Posted May 12, 2024 4:41 UTC (Sun) by koh (subscriber, #101482) [Link] (1 responses)

Exact same experience for me in both C and C++. I have not found any use for -Og so far.

We have -Werror on also for optimized debug builds (those with -O2 -UNDEBUG) in the CI. The only way they differ from release builds is NDEBUG. Locally, by default, I run -Wno-error, because I frequently switch between compilers/versions.

Modern C for Fedora (and the world)

Posted May 12, 2024 14:00 UTC (Sun) by pizza (subscriber, #46) [Link]

> Exact same experience for me in both C and C++. I have not found any use for -Og so far.

Over the years I've had to work with codebases that simply wouldn't *fit* in the available space without a combination of -Os and LTO. -Og has proven to be quite useful in a context where -O0 simply isn't feasible.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 12:11 UTC (Sat) by kreijack (guest, #43513) [Link]

> People building from upstream are typically building from a VCS checkout so that doesn't help.

People building from upstream, should be able to deal with these kind of issues; usually this means reading a README file.
If not, they should use a distro package.

> Currently we build with -Werror -Wall for CMAKE_BUILD_TYPE=DEBUG, and not for CMAKE_BUILD_TYPE=RELEASE. That's assuming developers build regularly with DEBUG and people who just want a working upstream build don't. It works out OK in practice. It doesn't seem ideal but maybe it's about as good as it can be.

This is a sane principle.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 8:35 UTC (Sat) by marcH (subscriber, #57642) [Link] (5 responses)

-Werror (and others) should be easy to turn on and off, this is very important.

What I found to work well is to have -Werror added only in pre-merge CI. Not having it by default makes prototyping more convenient.

This is consistent with running linters in pre-merge CI while not forcing developers to run them all the time.

None of this approach is specific to C.

Of course you need to have some pre-merge CI in the first place. If you don't even have that minimal level of CI then the project is basically unmaintained.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 14:46 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (4 responses)

Even there I find `-Werror` is not the best solution. We prefer to keep the warnings and allow the build to continue on after hitting an "error" and then fail the job at the very end if any warnings happened. This allows one to get more than one round of warnings out of CI at a time. Real failures still stop the build because continuing the build after a hard failure (`make -i`) is a recipe for chasing wild geese. The `-k` flag can be useful, but is a tradeoff between wasted CI cycles and comprehensive error reports.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 17:54 UTC (Sat) by marcH (subscriber, #57642) [Link] (3 responses)

Showing all warnings is nice but you don't want CI to babysit developers too much otherwise there will always be a couple lazy (and frankly: not very smart) developers who will "spam" CI because they can't bother to find how to enable -Werror themselves. They wrongly think they save time that way and waste and in some cases even slow down the whole CI infrastructure. Been there, seen that.

This being said, the simplest and best solution is to compile twice: once without -Werror and once with -Werror. This can be in two separate (and clearly labeled) runs or even consecutively. The first run shows all warnings and the second blocks the merge.

This is a bit similar to the `make || make -j1` technique that avoids (real) errors being drowned by many threads and confusing developers.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 21:43 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (2 responses)

Developers aren't looking through build logs. Instead, CTest gathers the output, does some filtering (e.g., we ignore third party warnings in CI) and uploads it to CDash for viewing. It also obviates the need for the `-j1` trick. We also don't need the second `-Werror` run (which pollutes the build cache) and instead just get the (post-filtered) warning count from CTest and trigger a script failure if it is non-zero.

I'll do an initial run on all of the CI configurations to get a survey of what is broken and then focus on what is broken after that (I don't build all of the configurations locally to know anyways).

Modern C for Fedora (and the world)

Posted Dec 10, 2023 0:53 UTC (Sun) by marcH (subscriber, #57642) [Link] (1 responses)

> Instead, CTest gathers the output, does some filtering

If you have a good test framework that does all that for you then you should absolutely ignore my previous post. Not everyone is that lucky. I mean many projects don't even have any pre-merge CI at all (yet?). Remember that the main article is about Fedora and others stepping up to rescue orphaned projects coded in ancient C. In such a context my simple advice above definitely holds because it's just one extra line in your CI configuration. Super cheap and very high value and something people not familiar with CI may think about.

> Developers aren't looking through build logs.

They don't by default (assuming of course you have developers in the first place...)

They definitely do when there's a CI red light somewhere that threatens the merge of their code any maybe their deadline. In such a case I know from first hand experience that they really enjoy the simple "tricks" I recommended above.

> and uploads it to CDash for viewing.

I don't know anything about CDash but I know neither GitHub nor Jenkins nor Gitlab has any "yellow light"/warning concept, it's either green/pass or red/fail. Running twice with and without -Werror also solves that display limitation problem extremely cheaply. Again: if you have a smarter and better viewer then by all means ignore my tricks.

> We also don't need the second `-Werror` run (which pollutes the build cache)

Curious what you mean here.

Modern C for Fedora (and the world)

Posted Dec 10, 2023 4:10 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

> I don't know anything about CDash but I know neither GitHub nor Jenkins nor Gitlab has any "yellow light"/warning concept, it's either green/pass or red/fail. Running twice with and without -Werror also solves that display limitation problem extremely cheaply. Again: if you have a smarter and better viewer then by all means ignore my tricks.

GitLab-CI does have a "warning" mode with the `allow_failure` key[1]. We use exit code 47 to indicate "warnings happened" so that the testing can proceed even though the build made warning noise. There are issues with PowerShell exit code extraction and that always hard-fails, but that seems to be a gitlab-runner issue (it worked before we upgraded for other reasons). It's actually nifty because it still reports as a `failed` *state* and the `allow_failure` key on the job just changes the render and "can dependent jobs proceed" logic, so our merge robot just sees that state and says "no" to merging.

> > We also don't need the second `-Werror` run (which pollutes the build cache)

> Curious what you mean here.

We have a shared cache for CI (`sccache`-based; `buildcache` on Windows). Adding another set of same-object output for a different set of flags just removes space otherwise ideally suited for storing other build results (*maybe* the object is deduplicated, but it doesn't seem necessary to me; probably backend-dependent anyways).

[1] https://docs.gitlab.com/ee/ci/yaml/#allow_failure

Modern C for Fedora (and the world)

Posted Dec 9, 2023 15:25 UTC (Sat) by Paf (subscriber, #91811) [Link]

Just a little context for my specific project, because I agree with the points you’re making about inconvenience. I work on an out of tree (GPL licensed!) file system project. We support a decent variety of distributions, but we have extensive CI so we catch stuff early, and since it’s a file system, it’s not common for people to try to build for/with something we don’t support. (And our formal position on support for kernels is “there’s a (fairly wide) list we test, otherwise good luck and we’re accepting patches”).

So we have circumstances that are a bit different, I think.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 1:24 UTC (Sat) by Wol (subscriber, #4433) [Link]

> On the other hand, if you are able to keep up with the load, it's about the best thing you can do. With each modernization push, enforced by making warnings fail hard, the amount of runtime errors can be brought down considerably, and by now nearly all issues we have are caused by missing or disabled static check.

That's what I did with a code base. Just worked through the codebase adding -W3 to each module in turn, and cleared all the errors. It took time, but the quality of the code base shot up, and loads of unexplained errors just disappeared :-)

Cheers,
Wol

Modern C for Fedora (and the world)

Posted Dec 11, 2023 14:48 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link] (1 responses)

A reasonable way to think about this is to treat all those compiler warnings as technical debt. Paying off that technical debt will be painful, especially if you have a lot of it, but it's probably worth it in the long run. The big cost will be when you take a project that has allowed the warnings to pile up and suddenly force everyone to spend time fixing those warnings rather than develop anything new. Dealing with new warnings as compilers change their mind about what deserves a warning will be more manageable. The main problem in that case is letting the compiler writers dictate when you pay off your technical debt rather than making the decision yourself.

Modern C for Fedora (and the world)

Posted Dec 11, 2023 16:46 UTC (Mon) by Wol (subscriber, #4433) [Link]

If the project isn't too big ...

We had a project where we couldn't suppress a particular warning (MSC v6, -W4, bought in library, unused arguments. Catch 22, we could fix warning A, but the fix triggered warning B, cue endless loop).

Anyways, our standards said "All warnings must be explained and understood". So that one we just ignored. There's no reason a project can't say "it's an old warning, we haven't got round to fixing it". But any new warning in modified code is an instant QA failure.

Cheers,
Wol

Modern C for Fedora (and the world)

Posted Dec 9, 2023 16:24 UTC (Sat) by jwarnica (subscriber, #27492) [Link]

Given the fundamental theory of CI/CD is to fix discovered problems "now", allowing them to be ignored seems counterproductive. By doing it always (er, "continuously") you force yourself to discover implicit assumptions you did noy even know you made. That is as much true of code you wrote, as it is of external libraries, and the tools you use to build things.

Introducing a new complier version is a significant step. Perhaps you will need a development branch to work through that, but you should either never change compiler versions, or actually do all that is needed when you do....

Which could we be disabling particular checks in the build process. But if you said Wall, then you have implicitly deferred to the compiler people's taste.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 22:42 UTC (Sat) by quotemstr (subscriber, #45331) [Link] (2 responses)

What we really need is a date- or release-based -Wall. For example, one might write -Wall=gcc-12 and get all the warnings enabled with -Wall in GCC 12 and not any additional warnings GCC 13 might introduce. You could safely combine -Werror with -Wall this way.

Modern C for Fedora (and the world)

Posted Dec 10, 2023 11:25 UTC (Sun) by joib (subscriber, #8541) [Link]

It might *help*, but it's probably not foolproof either, as a newer GCC release might have improved the -Wfoo diagnostics path to catch cases that the older release didn't catch. Of course you could, in principle at least, split the enhanced version into a separate -Wfoo-gcc-XY, or something like that, which is only activated when -Wall=gcc-XX isn't enabled. But I suspect GCC wouldn't want to commit itself to such a level of backward compatibility in the warnings. And of course if there's ever any refactoring of some particular warning, requiring to provide perfect backward compatibility for the previous 27 major releases would probably be prohibitive.

Modern C for Fedora (and the world)

Posted Dec 14, 2023 12:04 UTC (Thu) by spacefrogg (subscriber, #119608) [Link]

This is trivial to achieve. Just record the GCC version in your build scripts and set -Wall, when it updates. Then, update the recorded version once you are satisfied. This doesn't need any upstream support.

Modern C for Fedora (and the world)

Posted Dec 8, 2023 17:36 UTC (Fri) by Hello71 (subscriber, #103412) [Link] (2 responses)

With register-based calling conventions, longs below 2^32 are usually passed unscathed. Furthermore, I believe x86-64 SysV ABI leaves the extended register bits (32-63) undefined, so in many cases they might accidentally have the right values.

Modern C for Fedora (and the world)

Posted Dec 11, 2023 8:10 UTC (Mon) by jengelh (guest, #33263) [Link] (1 responses)

In addition, the use of little endian causes low values behind a pointer argument or an stack-passed argument ("dword ptr [rsp+0x20]") to be accidentally "in the right spot". [In other words, whenever *(uint64_t)ptr == *(uint32_t)ptr.] It is a shame that big endian systems are going away.

Modern C for Fedora (and the world)

Posted Dec 13, 2023 1:52 UTC (Wed) by marcH (subscriber, #57642) [Link]

This is exactly why little endian and big endian are not just two sides of the same coin.

Big endian is more "human-friendly" because you can read hexdumps "as is" (because humans use big endian too)

Little endian is more "computer-friendly" because of what you just explained.

In other words, Gulliver is wrong here.

About type inference coming to the C language as well

Posted Dec 10, 2023 8:57 UTC (Sun) by swilmet (subscriber, #98424) [Link] (23 responses)

In my opinion, type inference for variable declarations should be used only sparingly, when the type of the variable is already visible (and quite long to write) on the right-hand side of the assignment. Writing the types of variables explicitly enhance code comprehension.

See this article that I wrote this night after reading this LWN article: About type inference

(the article is 2 pages long, a bit too long to copy here as a comment, I suppose).

About type inference coming to the C language as well

Posted Dec 10, 2023 9:00 UTC (Sun) by swilmet (subscriber, #98424) [Link]

(Oops, posted my comment as a sub-comment instead of a new top-level one, I clicked on the wrong reply button…)

About type inference coming to the C language as well

Posted Dec 10, 2023 11:52 UTC (Sun) by excors (subscriber, #95769) [Link] (20 responses)

I think an important point that's missing from your argument is that modern languages have much more sophisticated type systems than C, with features like generics, and modern libraries make use of those type systems, so type names are very commonly much longer (and sometimes impossible) to write. If you don't have type inference, the language will be restricted to much simpler types, and you lose the correctness and performance benefits of having more information statically encoded in types.

Like, using `auto` instead of `const char*` or `ArrayList<String>` isn't a huge benefit, because those are pretty simple types. But when you're regularly writing code like:

for (std::map<std::string, std::string>::iterator it = m.begin(); it != m.end(); ++it) { ... }

then it gets quite annoying, since the type name makes up half the line, and it obscures the high-level intent of the code (which is simply to iterate over `m`). (And that's not the real type anyway; `std::string` is the templated `std::basic_string<char>`, and the `iterator` is a typedef which is documented to be a LegacyBidirectionalIterator which is a LegacyForwardIterator which is a LegacyIterator which specifies the `++it` operation etc, so in practice you're not going to figure out how the type behaves from the documentation - you're really going to need a type-aware text editor or IDE, at least until you've memorised enough of the typical library usage patterns. That's just an obligatory part of modern programming.)

Or in Rust you might rely on type inference like:

let v = line.split_ascii_whitespace().map(|s| s.parse().unwrap());
let vals: Vec<i32> = v.collect();

where you can see the important information (that it ends up with a vector of ints), and you can assume `v` is some sort of iterable thing but you don't care exactly what. Writing it explicitly would be something terrible like:

let v: std::iter::Map<std::str::SplitAsciiWhitespace<'_>, impl Fn(&str) -> i32> = line.split_ascii_whitespace().map(|s| s.parse().unwrap());
let vals: Vec<i32> = v.collect();

except that won't actually work because the `'_` is referring to a lifetime which I don't think there is any way to express in code; and the closure is actually an anonymous type (constructed by the compiler to contain any captured variables) which implements the `Fn` trait, and you can only use the `impl Trait` syntax in argument types (where it's a form of generics) and return types (where it's a kind of information hiding), not in variable bindings, so there's no way to name the closure type. Rust's statically-checked lifetimes and non-heap-allocated closures are useful features that simply can't work without type inference.

About type inference coming to the C language as well

Posted Dec 10, 2023 21:52 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (5 responses)

Yeah, I'm a caveman, often working in Vim without any special development features, yet I am not bothered at all to see e.g. let chars = foo.bar().into_iter();

Sure, I have no idea what "type" chars actually is, but it's clearly some sort of Iterator, and somebody named it chars, I feel entitled to assume it impl Iterator<Item = char> unless it's obvious in context that it doesn't.

If anything I think I more often resent needing to spell out types for e.g. constants where I'm obliged to specify that const MAX_EXPIRY: MyDayType = 398; rather than let the compiler figure out that's the only correct type. I don't hate that enough to think it should be changed, it makes sense, but I definitely run into it more often than I regret not knowing the type of chars in a construction like let chars = foo.bar().into_iter()

However, of course C has lots of footguns which I can imagine would be worsened with inference, so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.

About type inference coming to the C language as well

Posted Dec 10, 2023 22:07 UTC (Sun) by mb (subscriber, #50428) [Link]

>so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.

Yes, that is true.

Type inference works well in Rust due to its strict type system.
But a subset of Rust's type inference will probably work well in C.

About type inference coming to the C language as well

Posted Dec 10, 2023 23:00 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (3 responses)

> However, of course C has lots of footguns which I can imagine would be worsened with inference, so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.

I would agree with this. The main concern I can think of is how C handles numeric conversions. They are messy, complicated, and I always have to look them up.[1] They can mostly be summarized as "promote everything to the narrowest type that can represent all values of both argument types, and if an integer, is at least as wide as int," but that summary is wrong (float usually *can't* represent all values of int, but C will just promote int to float anyway). Throwing type inference on top of that mess is probably just going to make things worse.

By contrast, Rust has no such logic. If you add i32 + i16, or any other situation where the types do not match, you just get a flat compiler error.

I do wish Rust would let me write this:

let x: i32 = 1;
let y: i16 = 2;
let z: i32 = x + y.into(); // Compiler error!

(Presumably this is because you can also add i32 + &i32, and the compiler isn't quite smart enough to rule out that override.)

The compiler suggests writing this abomination, which does work:

let z: i32 = x + <i16 as Into<i32>>::into(y);

But at least you can write this:

let x: i32 = 1;
let y: i16 = 2;
let y32: i32 = y.into();
let z: i32 = x + y32;

[1]: https://en.cppreference.com/w/c/language/conversion

About type inference coming to the C language as well

Posted Dec 11, 2023 3:08 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (2 responses)

And, after posting this comment, I've realized that the reason into() doesn't work is because you're *supposed* to write this instead:

let z: i32 = x + i32::from(y);

Obviously I need to spend more time studying Rust, or maybe actually sit down and write a toy program in it.

Finally, I should note that you can write "y as i32", but that's less safe because it will silently do a narrowing conversion. from() and into() can only do conversions that never lose data, and there's also try_from()/try_into() if you want to handle overflow explicitly.

About type inference coming to the C language as well

Posted Dec 11, 2023 13:08 UTC (Mon) by gspr (guest, #91542) [Link] (1 responses)

> and there's also try_from()/try_into() if you want to handle overflow explicitly.

And there's try_from().expect("Conversion failure") for those cases where you wanna say "man, I don't really wanna think about this, and I'm sure the one type converts to the other without loss in all cases my program experiences – but if I did overlook something, then at least abort with an error message instead of introducing silent errors".

About type inference coming to the C language as well

Posted May 8, 2024 15:41 UTC (Wed) by adobriyan (subscriber, #30858) [Link]

Rust could allow "u32 = u32 + u8" with it's usual overflow checks, but not "u32 + i8".

The "messy numeric conversions" are largely due to rubber types and the fact that there are lots of them
(5 main, __uint128, size_t, uintptr_t, intmax_t, ptrdiff_t). POSIX doesn't help with off_t.

If all you have is what Rust has, C is not _that_ bad.

Kernel has certain number of min(x, 1UL) expression just because x is "unsigned long", but it is clear that programmer wants typeof(x).

About type inference coming to the C language as well

Posted Dec 11, 2023 4:43 UTC (Mon) by swilmet (subscriber, #98424) [Link] (13 responses)

It's true that in C++ and Rust, types can be quite long to write.

Both C++ and Rust have a large core language, while C has a small core language.

I see Rust more as a successor to C++. C programmers in general - I think - like the fact that C has a small core language. So in C the types remain small to write, and there are more function calls instead of using sophisticated core language features. C is thus more verbose, and verbosity can be seen as an advantage.

Maybe the solution is to create a SubC language: a subset of C that is safe (or at least safer). That's already partly the case with the compiler options, hardening efforts etc.

About type inference coming to the C language as well

Posted Dec 11, 2023 8:39 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (12 responses)

> Maybe the solution is to create a SubC language: a subset of C that is safe (or at least safer). That's already partly the case with the compiler options, hardening efforts etc.

I disagree with this, assuming that "safe" means "cannot cause UB outside of an unsafe block." A safe version of C needs at least the following:

* Lifetimes and borrow checking, which implies a type annotation similar to generics.
* Type inference, or else you have to write lifetime annotations everywhere.
* Box<T> or something equivalent to Box<T>, or else you can't put big objects on the heap and move their ownership around.
* Arc<RwLock<T>> or some equivalent, or else you have no reasonable escape hatch from the borrow checker (other than unsafe blocks).
* Rc<RefCell<T>> or some equivalent, or else you have to use the multithreaded escape hatch even in single-threaded code.
* And then there are many other optimizations such as using Mutex<T> instead of RwLock<T>, or OnceCell<T> instead of RefCell<T>. All of these have valid equivalents in C, and should be possible to represent in our hypothesized "safe C" (without needing more than a minimal amount of unsafe, preferably buried somewhere in the stdlib so that "regular" code can be safe).

I just don't see how you provide all of that flexibility without doing monomorphization, at which point you're already 80% of the way to reinventing Rust.

About type inference coming to the C language as well

Posted Dec 11, 2023 11:10 UTC (Mon) by Sesse (subscriber, #53779) [Link] (3 responses)

I guess that if you banned threads and pointers (presumably requiring lots of globals) and made all array access bounds-checked and all data zero-initialized, you could get a safe C subset without going there. How useful it would be would be a different thing...

About type inference coming to the C language as well

Posted Dec 11, 2023 13:52 UTC (Mon) by farnz (subscriber, #17727) [Link] (2 responses)

If you're not careful, you end up with something like Wuffs. A perfectly useful language in some domains, but deliberately limited in scope to stop you writing many classes of bug.

About type inference coming to the C language as well

Posted Dec 14, 2023 10:55 UTC (Thu) by swilmet (subscriber, #98424) [Link] (1 responses)

Seems useful to write command-line programs, for example.

About type inference coming to the C language as well

Posted Dec 14, 2023 10:57 UTC (Thu) by farnz (subscriber, #17727) [Link]

You're not going to get very far when you can't access arguments, or do I/O. Wuffs is deliberately limited to not doing that, because it's dangerous to mix I/O with file format parsing.

About type inference coming to the C language as well

Posted Dec 11, 2023 11:35 UTC (Mon) by swilmet (subscriber, #98424) [Link] (7 responses)

I'm not an expert in programming languages design and security-related matters.

But why not trying a C-to-Rust transpiler? (random idea).

By keeping a small core language with the C syntax, and having a new standard library that looks like Rust but uses more function calls instead.

The transpiler would "take" the new stdlib as part of the language, for performance reasons, and translates the function calls to Rust idioms.

A source-to-source compiler is of course not ideal, but that's how C++ was created ("C with classes" was initially translated to C code).

About type inference coming to the C language as well

Posted Dec 11, 2023 12:09 UTC (Mon) by farnz (subscriber, #17727) [Link] (6 responses)

You might want to look at the C2Rust project; the issue is that a clean transpiler to Rust has to use unsafe liberally, since C constructs translate to something that can't be represented in purely Safe Rust.

The challenge then becomes adding something like lifetimes (so that you can translate pointers to Rust references instead of Rust raw pointers) without "bloating" C. I suspect that it's impossible to have a tiny core language without pushing many problems into the domain of "the programmer simply must not make any relevant mistakes"; note, though, that this is not bi-directional, since a language with a big core can still push many problems into that domain.

About type inference coming to the C language as well

Posted Dec 12, 2023 10:32 UTC (Tue) by swilmet (subscriber, #98424) [Link] (5 responses)

I didn't know C2Rust, it shows that my random idea is not stupid after all :)

But I had the idea to convert (a subset of) C to _safe_ Rust, of course. Instead of some Rust keywords, operators etc (the core language), have C functions instead.

Actually the GLib/GObject project is looking to have Rust-like way of handling things, see:
https://www.bassi.io/articles/2023/08/23/the-mirror/
(but a bit long to read, and one needs to know the GObject world to understand the blog post I think).

Anyway, that's an interesting topic for researchers. Then making it useful and consumable for real-world C projects is yet another task.

About type inference coming to the C language as well

Posted Dec 12, 2023 10:43 UTC (Tue) by farnz (subscriber, #17727) [Link]

The hard part is not the keywords and operators - it's the lifetime annotation system. Lifetimes are a check on what the programmer intended, so have to be possible to write as an annotation to pointer types in the C derived language, but then to be usable force you to have a generics system (since you want many things to be generic over a lifetime) with (at least) covariance and invariance possible to express.

And once you have a generics system that can express covariance and invariance for each item in a set of generic parameters, why wouldn't you allow that to be used for types as well as lifetimes? At which point, you have Rust traits and structs, and most of the complexity of Rust.

About type inference coming to the C language as well

Posted Dec 12, 2023 11:34 UTC (Tue) by mb (subscriber, #50428) [Link] (3 responses)

>But I had the idea to convert (a subset of) C to _safe_ Rust, of course.

That is not possible, except for very trivial cases.

The C code does neither include enough information (e.g. lifetimes) for that to work, nor is it usually structured in a way for this to work.

Programming in Rust requires a different way of thinking and a different way of structuring your code. An automatic translation of the usual ideomatic C programs will fail so hard that it would be easier to rewrite it from scratch instead of translating it and then fixing the compile failures.

About type inference coming to the C language as well

Posted Dec 13, 2023 23:59 UTC (Wed) by swilmet (subscriber, #98424) [Link] (2 responses)

The C syntax alone is not enough, but comments with annotations can be added, and become part of the language.

I started to learn Rust but dislike the fact that it has many core features ("high-level ergonomics"). It's probably possible to use Rust in a simplistic way though, except maybe if a library forces to use the fancy features.

About type inference coming to the C language as well

Posted Dec 14, 2023 9:37 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

You could avoid using those libraries, and limit yourself to libraries that have a "simple" enough interface for you (no_std libraries are a good thing to look for here, since they're designed with just core and maybe alloc in mind, not the whole of std) - bearing in mind that you don't need to care how those libraries are implemented if it's just about personal preference.

In general, though, I wouldn't be scared of a complex core language - all of that complexity has to be handled somewhere, and a complex core language can mean that complexity is being compiler-checked instead of human-checked.

About type inference coming to the C language as well

Posted Dec 14, 2023 11:07 UTC (Thu) by swilmet (subscriber, #98424) [Link]

The codebases that I maintain already use between two and three/four main programming languages (welcome to GNOME, I should say). At some point I wanted to write new code in Rust, but it means adding more complexity and being less productive for some time while learning the language.

"Soft"ware, they said :-)

About type inference coming to the C language as well

Posted Dec 10, 2023 12:03 UTC (Sun) by Wol (subscriber, #4433) [Link]

> In my opinion, type inference for variable declarations should be used only sparingly, when the type of the variable is already visible (and quite long to write) on the right-hand side of the assignment. Writing the types of variables explicitly enhance code comprehension.

Have a variable type of "infer"? That way, an undeclared variable is still an error, but you can explicitly tell the compiler to decide for itself :-)

Cheers,
Wol

Modern C for Fedora (and the world)

Posted Dec 10, 2023 19:59 UTC (Sun) by geert (subscriber, #98403) [Link]

It could have been fatal on m68k, too, as integer types are returned in register d0, and pointer types in register a0.
However, gcc still seems to add "move.l %a0,%d0" at the end of any function returning a pointer type.

Modern C for Fedora (and the world)

Posted Dec 22, 2023 6:06 UTC (Fri) by glandium (guest, #46059) [Link]

zlib absolute did, until very recently.

Why Fedora and GCC only?

Posted Dec 8, 2023 17:32 UTC (Fri) by Hello71 (subscriber, #103412) [Link] (2 responses)

It's a bit disappointing that Fedora and GCC get so much focus in this article. This work has been going on for a while, but as far as I know, the recent push over the last year was instigated by Clang 15 quietly making implicit function declarations errors by default, which was reverted: https://discourse.llvm.org/t/configure-script-breakage-wi..., but reapplied for Clang 16 with a more prominent release note: https://releases.llvm.org/16.0.0/tools/clang/docs/Release.... While Fedora has certainly contributed to the modern C effort, it seems unfair to praise them and leave Gentoo (particularly Sam James) and other distributions without a mention. Similarly, while GCC has had grumblings about this for a while, as far as I know Clang was the one to make an official statement that they were actually going to for-real change the default, giving distributions leverage to push upstreams to fix their bad code.

Why Fedora and GCC only?

Posted Dec 9, 2023 10:50 UTC (Sat) by fw (subscriber, #26023) [Link] (1 responses)

Fully agreed regarding Gentoo, their Clang compilation efforts, and the contributions from Sam James and others. They have been really helpful, especially with their focus on contributing patches upstream were feasible. It's also great that their approach to detect silent miscompilation is quite different, which adds additional verification.

There must have been similar efforts going on for Homebrew and Macports and the various BSDs, to increase Clang compatibility, but compared to the Gentoo effort, I have seen fewer upstream contributions. Personally, I found that rather disappointing. I'm not aware of the Clang upstream project making similar assessments like we did for GCC regarding overall impact, and taking active steps to manage that. Or Apple when they switched Xcode to more errors well before upstream Clang, as I understand it.

The Clang change, along with Fedora's express desire to offer Clang as a fully supported compiler to package maintainers, certainly provided some justification for tackling these issues, and opened up even some limited additional resources (and every bit helps for this). But were it not for Gentoo's contributions, I think the practical impact of the earlier Clang change would have been pretty limited unfortunately.

Why Fedora and GCC only?

Posted Dec 12, 2023 7:18 UTC (Tue) by areilly (subscriber, #87829) [Link]

Clang is the system cc on FreeBSD, and has been for a few years. It (well, some parts) don't compile without warnings. Much/most of the ports tree builds with it too, although some depend on gcc. Perhaps I'm misunderstanding the argument...

Obsolete C for you and me

Posted Dec 8, 2023 21:38 UTC (Fri) by saladin (subscriber, #161355) [Link] (105 responses)

I personally use a lot of old C code which uses some of these "misfeatures". It's not clear in this article whether the Clang & GCC folks intend to keep this old, perfectly functional, code compiling. Will I need to pass an extra command line flag? Patch my compiler to restore the old behaviour?

I know the standard response is to either fix the software or stick with old compilers, but why? Current compilers work exactly as intended, already warn about the dangers of these constructs, and if Fedora wants to eliminate use of these constructs, then they can use -Werror or patch their build of GCC. Enabling new C language features should not have to break old C code; the compilers already treat different versions differently wrt. keywords and especially the 'auto' keyword in C23.

Also, these tricks are very fun to exploit when it comes to golfing.

Obsolete C for you and me

Posted Dec 8, 2023 21:47 UTC (Fri) by tshow (subscriber, #6411) [Link] (1 responses)

With GCC, at least, you can `-std=c90` and build classic C.

Obsolete C for you and me

Posted Dec 9, 2023 10:55 UTC (Sat) by fw (subscriber, #26023) [Link]

More generally, you need to use -fpermissive, a new option (for the C front end) in GCC 14.

There are a couple of problematic projects out there which explicitly rely on C99 and later features which are not available as GNU extensions with -std=c90 or -std=gnu90, particularly the new inlining semantics. If you just throw in -fpermissive, it also remains active if the build system automatically selects -std=c11 (for example), despite the use of language features that were removed in C99. (The relative order of -fpermissive and the C dialect options does not matter, unlike for the C dialect options themselves.)

Obsolete C for you and me

Posted Dec 8, 2023 21:54 UTC (Fri) by tshow (subscriber, #6411) [Link] (26 responses)

Many of these features are still available even in modern C with some tweaks. `intptr_t`, for instance, gives you an `int` that can hold a pointer value, so if (say) your critbit trie is stealing the low bit from node pointers for leaf vs. internal markup, you can do that with no compiler warnings at the cost of a bit of casting.

As for `auto`, sure, it used to have a different meaning ("not register"), but it was a useless keyword from what I could tell; everything was `auto` by default, and I don't believe you could do `auto register int i;` to cancel the `register` markup. Any existing code using it will be doing something like `auto i = 3;` which in old money implied `auto int i = 3;` and in new money will type infer to `int i = 3;`. I strongly suspect it would be hard to intentionally craft an example where the change in meaning of `auto` actually breaks, and if it's possible at all I'd guess it would involve some fairly hairy macro magic.

Obsolete C for you and me

Posted Dec 9, 2023 3:26 UTC (Sat) by zev (subscriber, #88455) [Link] (25 responses)

> Any existing code using it will be doing something like `auto i = 3;` which in old money implied `auto int i = 3;` and in new money will type infer to `int i = 3;`. I strongly suspect it would be hard to intentionally craft an example where the change in meaning of `auto` actually breaks, and if it's possible at all I'd guess it would involve some fairly hairy macro magic.

Would something with a float literal be a simple example? Certainly not something one would normally expect to see, but with the mountains of legacy code I wouldn't be surprised to find such instances floating around...

I don't happen to have a compiler with support for the new (C23) meaning of 'auto' lying around, but if it behaves like it does in C++:

$ cat foo.c
#include <stdio.h>
int main(void)
{
auto i = 3.9;
printf("%d\n", i);
}
$ gcc -w -o foo foo.c && ./foo
3
$ g++ -w -o foo foo.c && ./foo
-2146603272

Obsolete C for you and me

Posted Dec 9, 2023 4:23 UTC (Sat) by tshow (subscriber, #6411) [Link] (24 responses)

I suspect a lot of C code old enough to have `auto` used in the old way probably predates most hardware floating point support. It's possible there's code like that out there, but I wouldn't put money on it.

Obsolete C for you and me

Posted Dec 9, 2023 14:16 UTC (Sat) by willy (subscriber, #9762) [Link] (23 responses)

Hardware floating point has been around since the IBM 704 in 1954. It was also available on many PDP-11 models. The Sun-3 had an FP unit. I'm not sure where C was being written in the 1970s and 1980s that didn't have an FP unit.

Obsolete C for you and me

Posted Dec 9, 2023 15:24 UTC (Sat) by Wol (subscriber, #4433) [Link] (2 responses)

> I'm not sure where C was being written in the 1970s and 1980s that didn't have an FP unit.

Intel before the (was it) 486? Or more likely the 386. Which iirc we're talking early 90s. I'm sure I was using a load of 286 computers when I started that new job in 1989 ...

Cheers,
Wol

Obsolete C for you and me

Posted Dec 11, 2023 10:23 UTC (Mon) by taladar (subscriber, #68407) [Link] (1 responses)

IIRC 386 and 486 had SX and DX variants and the former had no FPU and the latter did.

Obsolete C for you and me

Posted Dec 11, 2023 11:02 UTC (Mon) by mjg59 (subscriber, #23239) [Link]

That's true for the 486, but no 386 had a built-in FPU - the 386SX had a 16-bit data bus and the 386DX had a 32-bit one.

Obsolete C for you and me

Posted Dec 9, 2023 15:31 UTC (Sat) by Paf (subscriber, #91811) [Link] (7 responses)

Well, there were examples, for sure - the Apple II didn’t have hardware FP, for instance. But it’s certainly not the case that hardware FP support wasn’t common (at least on big stuff).

Obsolete C for you and me

Posted Dec 9, 2023 15:35 UTC (Sat) by willy (subscriber, #9762) [Link] (6 responses)

Were people really writing significant amounts of C on bittyboxes like the Apple II? My background in that era was a lot of BASIC, some Pascal, some assembler. Fortran and C were for the Real Computers in the data centre.

Obsolete C for you and me

Posted Dec 9, 2023 15:50 UTC (Sat) by Wol (subscriber, #4433) [Link]

I dunno about significant, and I couldn't put a date on it, but early 90s I took over maintaining C programs running on PCs. That was Microsoft C 5. And we made extensive of use of overlays to get round the 640K/1M memory limit. So I guess there would have been quite a lot of PC C round about that time.

(That was the program(s) I set -W3 / -W4 on.)

Cheers,
Wol

Obsolete C for you and me

Posted Dec 10, 2023 1:30 UTC (Sun) by Paf (subscriber, #91811) [Link] (1 responses)

I don’t *think* so, but I wasn’t around for that - I’m in my 30s, I was just pretty sure several early personal computers lacked hardware FP because I vaguely remember when it started appearing in the 90s(?).

The point is just that with and without hardware FP both existed, I guess.

Obsolete C for you and me

Posted Dec 10, 2023 11:11 UTC (Sun) by Wol (subscriber, #4433) [Link]

Just to throw it into the mix, the minis I worked on had microcoded Fixed Point BCD. And it was fast, even if as the programmer you had to be careful to take care of the decimal point ...

Cheers,
Wol

Obsolete C for you and me

Posted Dec 10, 2023 3:11 UTC (Sun) by mjg59 (subscriber, #23239) [Link] (2 responses)

Not so much, but it was certainly the case for Amigas and STs and only very high end versions of those had FPUs

Obsolete C for you and me

Posted Dec 10, 2023 3:19 UTC (Sun) by jake (editor, #205) [Link] (1 responses)

> but it was certainly the case for Amigas and STs and only very high end versions of those had FPUs

hmm, i wrote the code for my 3D graphics grad school class in C on an Amiga 1000 in 1986 or 7 ... i suppose it is possible that it was all software floating-point, but i certainly did not encounter any problems in that regard ...

jake

Obsolete C for you and me

Posted Dec 10, 2023 3:26 UTC (Sun) by mjg59 (subscriber, #23239) [Link]

The A1000 was a 68000, which definitely had no hardware FPU. The first Amiga that shipped with an FPU by default was the A3000. Compilers would happily let you use floats and just fall back to software emulation for that.

Obsolete C for you and me

Posted Dec 9, 2023 15:51 UTC (Sat) by pizza (subscriber, #46) [Link] (11 responses)

> Hardware floating point has been around since the IBM 704 in 1954. It was also available on many PDP-11 models. The Sun-3 had an FP unit. I'm not sure where C was being written in the 1970s and 1980s that didn't have an FP unit.

"Has been around" and "available on many models" is a _long_ way from "can assume it's generally/routinely available" especially in the 1970s and 1980s.

Indeed, it wasn't until the early 1990s that personal computers of any sort could be expected to have a built-in FPU (eg i486 in 1989, 68040 in 1990). ARM didn't have an _architectural_ FP spec until the v7 family (ie Cortex-A) which didn't launch until the early 2000s.

Even in the UNIX realm, SPARCv7 didn't have an architecturally defined FPU, and many different ones were bolted onto the side. SPARCv8 (~1990) formally added an architectural FPU [1], but it was still technically optional and many implementations (SPARCv8a) lacked all or part of the FPU instructions) DEC Alpha launched in 1992 with a built-in FPU, but its predecessor (ie VAX, PDP) didn't necessarily come with FP hardware either, as you yourself mentioned. MIPS defined an FP spec, but it was an external/optional component until the R4000/MIPS-III in 1991. Unsually, PA-RISC appears to have an FPU for all of its implementations, which started arriving in 1988.

So, no, you couldn't generally rely on having an FP unit until the early 1990s. even then you had to be using fairly new equipment. Prior to that, FPUs were an (expensive!) option that you only opted for if you needed one. Everyone else had to make do with (very slow) software-based FP emulation, or rewrite their algorithms to use fixed-point mathematics, The latter approach is _still_ used wherever possible when performance is critical.

Heck, even today, the overwhelming majority of the CPU cores shipped still lack any sort of FPU, and I'd bet good money most of those are running predominately C or C++ codebases. (Yes, I'm referring to microcontrollers...)

[1] Incidentally, SPARCv8 was the basis for the IEEE754 floating point specification.

Obsolete C for you and me

Posted Dec 9, 2023 16:27 UTC (Sat) by pizza (subscriber, #46) [Link] (1 responses)

> ARM didn't have an _architectural_ FP spec until the v7 family (ie Cortex-A) which didn't launch until the early 2000s.

Correction -- Like so many other ARM things, they have a wide variety of floating point units that operated using different instructions; it wasn't until armv7 that you could expect/rely on a consistent FP baseline that worked the same way.

(The first ARM processor with the option of FP support was the ARM6 (armv3) in 1991)

Obsolete C for you and me

Posted Dec 9, 2023 16:44 UTC (Sat) by willy (subscriber, #9762) [Link]

AFAIK the only change to ARM FP insns has been the introduction of VFP (analogous to the MMX/SSE/... transition on x86). The Weitek coprocessor had its own insn set, but the FP emulator translated ARM FP insns into Weitek insns. The FPA10, FPA11 and FPEmulator all had the same insn set.

Obsolete C for you and me

Posted Dec 9, 2023 16:37 UTC (Sat) by willy (subscriber, #9762) [Link] (8 responses)

SPARCv8 was not the basis for IEEE 754. That standard was issued in 1985, and the first v8 chips were released in 1992. Wikipedia has:

"This standard was significantly based on a proposal from Intel, which was designing the i8087 numerical coprocessor; Motorola, which was designing the 68000 around the same time, gave significant input as well."

And yes, I'm aware that personal computers didn't have much hardware FP available, but my contention is that there wasn't much C being written on PCs of that era.

Also, I don't think an "architectural spec" is particularly meaningful. I was active in the ARM scene and I remember the Weitek coprocessor, the FPA10, FPA11 and the infamous mixed endian FP format. People used floating point with or without hardware, and with or without an architectural spec.

Obsolete C for you and me

Posted Dec 9, 2023 17:15 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

> And yes, I'm aware that personal computers didn't have much hardware FP available, but my contention is that there wasn't much C being written on PCs of that era.

Well, I can think of at least one major program from that era ... the linux kernel ... (which was originally written for one of the early 386's, no?)

Cheers,
Wol

Obsolete C for you and me

Posted Dec 9, 2023 21:34 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

While true, Linux avoids FP (and SIMD) in its own code.

Obsolete C for you and me

Posted Dec 9, 2023 19:08 UTC (Sat) by pizza (subscriber, #46) [Link] (5 responses)

> And yes, I'm aware that personal computers didn't have much hardware FP available, but my contention is that there wasn't much C being written on PCs of that era.

During the 70s and early 80s, sure, not a lot of C on "personal" (ie non-UNIX) computers. but by the late 80s, that had changed.

Lattice C was released for DOS in 1982. Microsoft repackaged it for Microsoft C 1.0 in 1983. Borland released Turbo C in 1987, Watcom C was released in 1988 (and was the overwhelming favorite for game developers) GCC's first releases also landed in 1987.

While the 8087 FPU has been part of the x86 family since its introduction the late 70s, it was an expensive option, and as a result very little software was written that could directly take advantage of it. That had nothing do with the choice of programming language.

Obsolete C for you and me

Posted Dec 9, 2023 20:38 UTC (Sat) by willy (subscriber, #9762) [Link] (4 responses)

Thanks. I think our different experiences may be leading to our differing opinions. I was using floating point arithmetic in BBC BASIC on a 6502 in the 80s. Sure, it wasn't as fast as using integer arithmetic, but it was fast enough for my purposes.

https://beebwiki.mdfs.net/Floating_Point_number_format

If you're from a games background then the program is never fast enough ;-)

As an aside, I think the fact that Unix doesn't use floating point is quite crippling. If the sleep() syscall took a floating point argument, it would have meant we didn't need to add msleep(), usleep() (and I guess soon nsleep()). The various timespec formats would still need to exist (because you can't lose precision just because a file was created more than 2^24 seconds after the epoch), but _relative_ time can usually be expressed as a float. Indeed, Linux will round the sleep() argument -- see https://lwn.net/Articles/369549/

Floating-point in syscalls (was Obsolete C for you and me)

Posted Dec 9, 2023 21:52 UTC (Sat) by dskoll (subscriber, #1630) [Link] (2 responses)

nanosleep has existed for quite some time, so no need for an nsleep.

I don't really see a need for supporting floating point in UNIX system calls like sleep. Seems like overkill to me.

difftime returns the difference between two time_t objects as a double. But seeing as time_t in UNIX has only one-second resolution, that seems a bit silly to me, unless it's to prevent overflow if you subtract a very large negative time from a very large positive time.

Floating-point in syscalls (was Obsolete C for you and me)

Posted Dec 9, 2023 21:58 UTC (Sat) by willy (subscriber, #9762) [Link] (1 responses)

Double makes more sense as the result of subtracting two timespecs. See

https://www.infradead.org/~willy/linux/scan.c

and think how much more painful it would be to use some fixed point format (like, I don't know, a timespec)

I'm sure I could use a single precision float for this purpose, but that would definitely stray into the realm of premature optimization.

Floating-point in syscalls (was Obsolete C for you and me)

Posted Dec 9, 2023 23:09 UTC (Sat) by dskoll (subscriber, #1630) [Link]

Sure, yes, timespec has nanosecond precision. difftime takes arguments with only one-second precision.

Obsolete C for you and me

Posted Dec 10, 2023 19:40 UTC (Sun) by smoogen (subscriber, #97) [Link]

wouldn't a floating point 'sleep' then depend on how the architecture interpreted whatever 'floating point' down to 'steps' so you would end up with different 'small time' values on different computers. (especially if the sleep was calculated from an equation before being set.) If you are already looking at 'millisecond' sleep times you probably want to always 'sleep' for teh same amount and not to depend on floating point math which might change from CPU or firmware revision.

Obsolete C for you and me

Posted Dec 8, 2023 21:59 UTC (Fri) by ErikF (subscriber, #118131) [Link] (5 responses)

I think that an additional C "standard" target would be helpful: if GCC wants to prohibit K&R by default that's fine with me, but if you could specify a `-std=kr` (or something like that) I would be content. The compiler won't be able to diagnose as many probable errors, for sure, but at least old stuff will compile until it can be looked at and ported.

And I agree that K&R C is a wonderful golfing language.

Obsolete C for you and me

Posted Dec 9, 2023 14:33 UTC (Sat) by Karellen (subscriber, #67644) [Link]

if you could specify a `-std=kr` (or something like that) I would be content.

Huh. For some reason I always thought it did. But looking back at at a random selection of manuals, even going back to GCC 2.95, there's no such value.

Maybe I just got confused about C89/C90 still permitting K&R-style syntax.

Obsolete C for you and me

Posted Dec 10, 2023 18:42 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (3 responses)

Is K&R even a standard? I thought it was a book/manual. I tend to assume it predates things like UB and the as-if rule, for example. Of course you can just say "follow C89, but allow constructs X, Y, and Z that were only in the K&R book," but that feels like it should be spelled -std=c89 -fwhatever.

Obsolete C for you and me

Posted Dec 10, 2023 19:49 UTC (Sun) by smoogen (subscriber, #97) [Link] (2 responses)

I remember dealing with some code in I had written to meet K&R Second edition which was based on the ANSI C (so around 1990?) but would not compile on some Sun or HP boxes. My mentor asked me if I had run the linter which spewed out various complaints. Using the older version and removing various 'sugar' to match the 77 C, the code compiled and ran. So my guess is that between 77(?) and 90, the original K&R was considered the implementation standard. [Because this turned into a regular problem, I ended up having both editions of K&R to remind myself what I might need to do with older code.]

Obsolete C for you and me

Posted Dec 10, 2023 20:36 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (1 responses)

Sure, that is a valid thing you might have to do as a user.

What I'm really getting at is, what is K&R C, from the compiler author's perspective? How do you know if a given compiler is a "valid" implementation of K&R? How do you know what optimizations are permitted? If the answer is "no optimizations are allowed, because there's no C abstract machine yet, so everything must be translated 1:1," then how do you decide what constitutes "1:1" output? Even the modern C standard does not define such a notion, and users would probably like to have some optimizations anyway.

I don't think there's any sensible answer to those questions that doesn't ultimately look like C89 with a bunch of -f options to enable K&R constructs, which is why there's no -std=kr option. The second edition explicitly acknowledges this limitation in the preface, and directs compiler authors to the C standard.

Obsolete C for you and me

Posted Dec 11, 2023 16:03 UTC (Mon) by smoogen (subscriber, #97) [Link]

Ah ok. I see what you meant. [My only guess would be that the standard before 89 would have been the AT&T compiler for Unix's before then (aka it was written by the Gods and so it is what we must follow.] That however probably also leads to every compiler before then having 'well it only really works this way on that CPU and OS.. we need it to do this on this arch.'

Obsolete C for you and me

Posted Dec 8, 2023 22:12 UTC (Fri) by mb (subscriber, #50428) [Link] (69 responses)

>whether the Clang & GCC folks intend to keep this old, perfectly functional, code compiling.

Well.
No. As the article says, no it will not be kept compiling.

If you want this legacy code keep compiling, please use a 30 years old compiler.

I am all for backwards compatibility.
Where it makes sense.

But there are limits.
This fixes real world problems that are known for decades.
There is no excuse for using these legacy C features.

Also: https://xkcd.com/1172/

Obsolete C for you and me

Posted Dec 9, 2023 7:09 UTC (Sat) by rrolls (subscriber, #151126) [Link] (35 responses)

> If you want this legacy code keep compiling, please use a 30 years old compiler.

I'll just go ahead and disagree with this.

Compilers should feel free to add as many new features as they like, including warnings/errors for usage of antiquated misfeatures. However, if compiler P turns source code Q into executable R when told to use some particular standard or version S, **it should continue to do so for all eternity**. Assuming Q is actually valid according to standard S, of course.

C compilers have been really good at this; a lot of other languages' compilers have not. This is one of the many reasons why, despite objectively not being a very good language these days compared to many others, C continues to be popular.

It's not just programming languages, either: you should **always** be able to use some program to turn some unchanged input into the same output, even if you have to update the program for one reason or another. TeX froze its design in 1990 for this exact reason, and it's still popular.

Imagine if when Python 3 came out, you had to write something like # python-standard: 3.0 at the top of every file to opt into all the new behavior. That would have avoided the entire Python 3 debacle. (And to be clear, I'm not at all a Python 2 stalwart; I'm a fan of modern Python. It's just the perfect example to use.)

I should not have to install a bunch of different versions of a compiler - all probably with their own conflicting system dependencies - just to compile a bunch of different programs from different eras, that aren't broken and don't need updating.

Heck, a few months ago I had a reason to use Python 2 to get some old Python 2 code running - and despite Python 2 not having been updated in years, it was incredibly simple to download the source code of Python 2.7.18, compile it and use it - because it was written in plain old C that compiles the same today, in gcc 12, as when it was written.

The "everything must be updated all the time because reasons, and it's fine for stuff to stop working once it's not been touched for even 2 years" concept is a modern concept, and not a very good one IMO.

Obsolete C for you and me

Posted Dec 9, 2023 8:21 UTC (Sat) by pm215 (subscriber, #98099) [Link] (3 responses)

Should gcc be able to compile sixth edition Unix code with its reversed increment operators ("x =+ 2;") and unnamed structs, or are we just debating at what point a particular version of C is obsolete enough that modern compilers need not support it?

Obsolete C for you and me

Posted Dec 12, 2023 13:33 UTC (Tue) by rrolls (subscriber, #151126) [Link] (2 responses)

Well, of course I'd say "x =- 2;" should be parsed as "x = (-2);", not as "x -= 2;".

However:

Did any actual C standard parse it as the latter?

If yes, then my point stands, meaning that if someone invokes a C compiler and explicitly tells it to use whatever old C standard that was, then that's the way it should be parsed. Does this cause a security issue? No! Because you have explicitly specify that.

If no, then the point is moot, because if it's not specified behavior then I'd say the compiler is free to change its behavior as it pleases.

Obsolete C for you and me

Posted Dec 12, 2023 14:08 UTC (Tue) by farnz (subscriber, #17727) [Link]

This now comes down to the definition of "actual C standard". No ISO C standard parsed it as the latter, but some K&R versions did. However, K&R wasn't a formal standard - it was more or less defined in terms of what the K&R compiler did.

Obsolete C for you and me

Posted Dec 12, 2023 14:13 UTC (Tue) by excors (subscriber, #95769) [Link]

The =+ style operators are used in the undated C Reference Manual at https://www.bell-labs.com/usr/dmr/www/cman.pdf

According to https://www.bell-labs.com/usr/dmr/www/chist.html :

> B introduced generalized assignment operators, using x=+y to add y to x. The notation came from Algol 68 [Wijngaarden 75] via McIlroy, who had incorporated it into his version of TMG. (In B and early C, the operator was spelled =+ instead of += ; this mistake, repaired in 1976, was induced by a seductively easy way of handling the first form in B's lexical analyzer.)

Obsolete C for you and me

Posted Dec 9, 2023 8:24 UTC (Sat) by matthias (subscriber, #94967) [Link] (2 responses)

> It's not just programming languages, either: you should **always** be able to use some program to turn some unchanged input into the same output, even if you have to update the program for one reason or another. TeX froze its design in 1990 for this exact reason, and it's still popular.

Unfortunately this stability does not hold for all the LaTeX packages that are used today. Output significantly changes from version to version. And who is still using plain TeX today?

In fact, the differences are usually small. But TeX wonderfully demonstrates the butterfly effect. Even the tiniest change in spacing can have huge changes several pages further down the document.

Obsolete C for you and me

Posted Dec 11, 2023 8:22 UTC (Mon) by jengelh (guest, #33263) [Link]

>Even the tiniest change in spacing can have huge changes several pages further down the document.

So basically how MSWord operated all those decades... :-p

Obsolete C for you and me

Posted Dec 12, 2023 13:35 UTC (Tue) by rrolls (subscriber, #151126) [Link]

OK, to be fair, that was just an example I'd heard of recently. I don't actually use TeX myself.

Obsolete C for you and me

Posted Dec 9, 2023 8:46 UTC (Sat) by marcH (subscriber, #57642) [Link] (1 responses)

> C compilers have been really good at this; a lot of other languages' compilers have not. This is one of the many reasons why, despite objectively not being a very good language these days compared to many others, C continues to be popular.

This is also why C is the least safe language in the world and the surest way to get hacked. Not the only way of course but the most likely by far.

I totally agree that software shouldn't be automatically bad just because it's old. But old, unmaintained C code is just bad and dangerous. This was basically the main point of the article but you seemed to have missed it.

Also, you seem to dismiss backwards compatibility in other languages a bit quickly.

Obsolete C for you and me

Posted Dec 12, 2023 13:48 UTC (Tue) by rrolls (subscriber, #151126) [Link]

> old, unmaintained C code is just bad and dangerous. This was basically the main point of the article but you seemed to have missed it.

The work mentioned in the article is good work! Opting in to new checks that disallow bad patterns, and then updating code to fix all the errors, is almost always a good thing to do.

My comment wasn't responding to the article. My comment was responding to the statement "If you want this legacy code keep compiling, please use a 30 years old compiler."

> Also, you seem to dismiss backwards compatibility in other languages a bit quickly.

"Backwards compatibility" these days usually tends to mean "we'll make your code spam log files with warnings for a year and then stop working altogether until you fix it".

Real backwards compatibility means that the intended and documented behavior of an old version of something will be kept (at least, once any necessary opt-ins have been performed), even if it's deemed to have some flaws.

I'm not calling for _exact_ behavior, such as bugs, to be retained - just behavior that is documented and intended (or at least was intended at the time it was documented).

Obsolete C for you and me

Posted Dec 9, 2023 8:51 UTC (Sat) by mb (subscriber, #50428) [Link] (18 responses)

>it should continue to do so for all eternity

I disagree. It should depend on how hard it is to fix the old code.

In the cases we are talking about it's actually trivial to make your old code compile again with a modern changed compiler.
You have implicit function declarations? Just add explicit ones! It's trivial.
Implicit int? Just change it. It's trivial.
and so on...

Yes, I do understand that there are many packages and programs doing this, so over all this is a big amount of work. But it is trivial work. And it can easily be parallelized.

Obsolete C for you and me

Posted Dec 9, 2023 14:20 UTC (Sat) by willy (subscriber, #9762) [Link] (17 responses)

It's trivial for you & I who are programmers. But for someone who wants to download, compile & run a program written 20 years ago, it may not be trivial.

Obsolete C for you and me

Posted Dec 9, 2023 14:26 UTC (Sat) by mb (subscriber, #50428) [Link] (15 responses)

> compile & run a program written 20 years ago, it may not be trivial.

That is nontrivial for soooo many more reasons.
Dependencies.

Obsolete C for you and me

Posted Dec 11, 2023 14:51 UTC (Mon) by wahern (subscriber, #37304) [Link] (14 responses)

Reliance on dependencies has grown significantly over the past couple of decades. And projects with many dependencies are not likely the kind of software you'll care about 20 years hence. A library or tool that does some specific task well is less likely to have many, if any, dependencies and more likely to be useful and desirable in 20 years.

A year ago or so I was pleasantly surprised when I attempted to compile the latest release of lrzsz, 0.12.20, from 1998 (https://www.ohse.de/uwe/software/lrzsz.html), and it almost compiled out-of-the-box[1] in a modern, stock macOS environment. It only took a few minutes to identify and fix the bitrot. Most of the handful of issues were missing header includes or header includes gated behind a bad feature test. Once it compiled to completion the remaining issues caught by the compiler were related to the 64-bit transition: some wrong printf specifiers and conflation of socklen_t with size_t. There may be other bugs, but it seemed to work once it compiled cleanly.

Also interesting (but less surprising) was how well the ./configure script held up, which was generated by autoconf 2.12 from 1996.

[1] Several were fixable with build flags: CFLAGS="-Wno-format-security" CPPFLAGS="-DSTDC_HEADERS" ./configure

Obsolete C for you and me

Posted Dec 11, 2023 16:15 UTC (Mon) by pizza (subscriber, #46) [Link]

> Reliance on dependencies has grown significantly over the past couple of decades. And projects with many dependencies are not likely the kind of software you'll care about 20 years hence. A library or tool that does some specific task well is less likely to have many, if any, dependencies and more likely to be useful and desirable in 20 years.

I like the way you expressed this, and find myself in agreement.

Obsolete C for you and me

Posted Dec 11, 2023 16:59 UTC (Mon) by mb (subscriber, #50428) [Link] (1 responses)

>and it almost compiled out-of-the-box[1] in a modern, stock macOS environment.

You are actually just saying that it didn't compile.
Which is exactly my point. It's not possible for a non-programmer to take a >20 year old program and just compile it with a modern compiler in a modern environment. There are so many reasons for this to fail. You listed a few of them.

Removing the features this article is about from the compiler will not really worsen the situation by any meaningful amount.

Obsolete C for you and me

Posted Dec 11, 2023 19:20 UTC (Mon) by wahern (subscriber, #37304) [Link]

I just tested on OpenBSD/amd64 and it compiles and runs despite the diagnostics and bugs. What does that say? I don't think it changes anything.

The places where the code falters largely relate to 1) its support for pre-ANSI C library interfaces, 2) a newer hardware architecture exposing non-standard code, and 3) non-POSIX extension APIs (e.g. gettext). I think this says something positive regarding the value of standards and backward compatibility.

Notably, some of the code does use K&R parameter lists. (At least the getopt_long compat implementation does, but on OpenBSD it was properly excluded from the build.) I'm not advocating for continued support for K&R, just pushing back against the notion that old code, and support for old code, has little value. 20 years isn't even that long in the grand scheme of things, especially in the context of a systems language like C.

Obsolete C for you and me

Posted Dec 11, 2023 19:26 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (10 responses)

The thing is, lrzsz is likely full of exploitable bugs. This is fine as long as nobody uses it outside of code archeology or hobby projects. But what usually happens, some clueless vendor will stick it in their WiFi router (or something similar) for firmware updates over the network. And now you have a vector for botnets.

So making it harder to compile is a feature, not a bug.

Obsolete C for you and me

Posted Dec 11, 2023 19:52 UTC (Mon) by pizza (subscriber, #46) [Link] (9 responses)

> So making it harder to compile is a feature, not a bug.

That's a good way to ensure your code never gets used by anyone other than yourself. And yourself too, I might add.

(In which case, why bother publicly publishing anything at all?)

Obsolete C for you and me

Posted Dec 11, 2023 19:58 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

> That's a good way to ensure your code never gets used by anyone other than yourself. And yourself too, I might add.

That's not such a bad outcome. Old and insecure code should not be used for new projects.

An analogy: I love our railroad park, I'm helping to restore an old steam engine. We are even planning to run it through on a public railroad some time next year. It's fun! But I for sure don't want these engines running nearby every day, they don't have any kind of emissions control, they have terrible efficiency, and they're just plain dangerous.

> (In which case, why bother publicly publishing anything at all?)

Mostly for historical/archival purposes.

Obsolete C for you and me

Posted Dec 11, 2023 20:04 UTC (Mon) by pizza (subscriber, #46) [Link] (7 responses)

> That's not such a bad outcome. Old and insecure code should not be used for new projects.

Newer == automatically better, got it.

Obsolete C for you and me

Posted Dec 11, 2023 20:36 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Not quite automatically, but yes.

Like pretty much everything else in engineering. The old code might be more compact and less resource-intensive, but it will almost 100% be less safe and robust.

Obsolete C for you and me

Posted Dec 11, 2023 20:52 UTC (Mon) by pizza (subscriber, #46) [Link] (5 responses)

> Newer == automatically better, got it.

I'm sorry, that was snarky and not what I meant to convey.

There's plenty of "newer" software that's grossly insecure or otherwise lacking in some way versus something that's been around a while. It usually takes a while to stabilize something into a generally usable form.

Meanwhile. When lrzsz was first published, it wasn't "old obsolete software" that should not be used for new projects. It was brand-new software, intended to be used by contemporary users. Saying that it shouldn't have been published or made more difficult to compile as to discourage folks from using it rather defeats the entire point of releasing it to begin with. And where would any of the F/OSS world be if that attitude was the norm?

What should this magic memory-hole/de-publishing cutoff point be? Months? Years? Decades?

One can't know in advance how long something will be actively developed or maintained. One can't know in advance how diligent users/integrators will be in ensuring stuff is kept up-to-date [1]. Meanwhile, its degree of maintenance tells you very little about its overall quality or suitability for a given purpose.

[1] Well, I suppose history shows us that the answer is "barely to never". How many routers are deployed running old, vulnerable versions of dnsmasq? How many are still shipping with these long-since-fixed vulerabilities?

Obsolete C for you and me

Posted Dec 11, 2023 21:21 UTC (Mon) by mb (subscriber, #50428) [Link]

Whether something is old or new does not make a difference for quality, most of the time.
What makes a huge difference is whether it is maintained or not.
That protects against bit rot.

Obsolete C for you and me

Posted Dec 11, 2023 23:21 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> Saying that it shouldn't have been published or made more difficult to compile as to discourage folks from using it rather defeats the entire point of releasing it to begin with. And where would any of the F/OSS world be if that attitude was the norm?

lzsz had been maintained, up until some point in the past. And for software that is being maintained, migrating to new toolchains every now and then is not a huge burden. So in a hypothetical world where we are still using modems, lzsz would be rewritten in hardened C (like ssh). But that's not what happened, at some point lzsz lost its users and was abandoned. So this example is a success story of full software lifecycle.

And that's why I'm fine with making lzsz more complicated to compile. It hasn't been maintained for two decades, and using it as-is in production in current conditions is bordering on irresponsible, so it has to be thoroughly audited and fixed anyway.

> Meanwhile, its degree of maintenance tells you very little about its overall quality or suitability for a given purpose.

Honestly? It usually does. Unless you're looking into an area that naturally disappeared.

Obsolete C for you and me

Posted Dec 12, 2023 9:42 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

Your comment about old, vulnerable versions of dnsmasq skirts the edge of the underlying problem; we treat software as an undecaying asset that can be built again and again from the source code and forever treated as "perfect", when it's closer in behaviour to the plans for a building. Nobody sensible takes plans from 5 years ago and expects them to be built unchanged with today's tools, techniques and materials; there's a constant shifting of requirements that means that we change things on the fly.

For example, I had recent construction work done; the plans were under 12 months old at the point I handed them over to the builder who did the work, and yet several changes were needed as we went along because the plans made assumptions that no longer held - the specific fire resistant material that had been specified was no longer available, and had to be substituted, the assumptions about what they'd find when digging for foundations turned out to be wrong (so we could use a smaller foundation), the locations of utilities as documented turned out to be wrong (including some that would have affected our neighbours if our builders hadn't changed what they did to match reality instead of the plans), and the price of insulating materials had changed so that we were able to change the plans to get much better insulation for slightly more money.

And this is closer to the reality of software; the source code is the plans, and (unlike construction), the build is done by automation, which means that whenever the plans turn out to be outdated, they need updating to keep up with modern requirements; at least with construction, when the plans turn out to be outdated, the workers turning plans into a building are capable of changing the plan on-the-fly to reflect reality on the ground, whereas a compiler simply isn't capable of doing that.

Obsolete C for you and me

Posted Dec 12, 2023 21:22 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

> at least with construction, when the plans turn out to be outdated, the workers turning plans into a building are capable of changing the plan on-the-fly to reflect reality on the ground, whereas a compiler simply isn't capable of doing that.

Or, if you've been relying on UB, the compiler turns out to be extra-capable of "changing the plan on-the-fly". But this is also because compilers take the source code as gospel and its "reality" is just some abstract fantasy machine.

And compilers do communicate through warnings and diagnostics all the time, but we're all too willing to ignore them at times.

Obsolete C for you and me

Posted Dec 12, 2023 21:27 UTC (Tue) by farnz (subscriber, #17727) [Link]

Oh, the compiler can change the plan all right - it just can't do so to reflect the reality on the ground (since that requires intelligence spotting that the programmer "meant" this, but wrote that), but instead to reflect its understanding of what you wrote, even if that's not what you intended to write.

Obsolete C for you and me

Posted Dec 14, 2023 23:11 UTC (Thu) by fw (subscriber, #26023) [Link]

If you think it's trivial, please try to fix the type safety issues in the cohomolo package for GAP. Even using -flto -Wlto-type-mismatch (which enables type checking across translation units), this one still looks really difficult to me. Of course, most packages are not like that.

Obsolete C for you and me

Posted Dec 9, 2023 17:22 UTC (Sat) by smoogen (subscriber, #97) [Link]

> I should not have to install a bunch of different versions of a compiler - all probably with their own conflicting system dependencies - just to
> compile a bunch of different programs from different eras, that aren't broken and don't need updating.
...
> The "everything must be updated all the time because reasons, and it's fine for stuff to stop working once it's not been touched for even
> 2 years" concept is a modern concept, and not a very good one IMO.

Having a single compiler which can compile all the backwards available code might be seen as a 'fairly' modern concept also. I had to maintain at least 4 different compiler sets for each of the Unix systems I maintained 25 years ago. For the previous 10 years, it had been quite common for particular C code to only compile with one specific compiler from Sun or HP, etc. Most of these times it was due to either a 'compiler defined side-effect' which the code needed (aka you could compile it with a different compiler from Sun, but the code might produce different results in certain runs). And for many languages you might end up having compilers which only compiled one specific version aka a Fortran66, Fortran77, C77 or C90 (and the various Snoball, etc ones). [While this was 25 years ago, I know many science, automotive and aeronautical systems tend to have to keep multiple versions of the same compiler because it is expected that some code relied on 'side-effects']

It was usually the gcc compiler set which normally could compile backwards compatible different code bases (within limits). You still might end up with code which acted differently between gcc-2.x and gcc-2.(x+1) but it was mainly due to one of the various 'vendor defined' or similar areas where what the compiler and the hardware does could change what happened. These sorts of code issues usually ended up with long emails between whatever scientist had coded the original to match a specific behaviour and the compiler developers pointing out that the standard could be interpreted in that region in any way they felt. [Or in some cases, any way the underlying hardware firmware decided... ]

Obsolete C for you and me

Posted Dec 11, 2023 11:57 UTC (Mon) by farnz (subscriber, #17727) [Link] (5 responses)

The problem we have, however, is that lots of legacy code Q is not valid according to any version of standard S, but an old version of compiler P happened to turn it into executable R. We wouldn't be having this discussion if we didn't have lots of legacy code that fell into the pit of "the program is ill-formed; no diagnostic is required" from various standards, where the program is not valid according to standard S, but a compiler does not have to tell you that you wrote an invalid program, but can produce an executable that may (or may not) do what you expect program Q to do.

Further, I disagree mildly with your premise - if Q is valid according to standard S, then there are many Rs that have the same behaviour on the real machine as program Q does on standard S's abstract machine. If, to choose an example, compiler P put a NOP in every delay slot as the easiest way to compile program Q to executable R, then I'd still like compiler P to be permitted to rearrange such that all delay slots do useful work if I compile program Q again. Unless, of course, by "compiler P", you mean a fixed version and set of flags to a compiler, and not later versions of compiler P, or changes to the compiler flags (e.g. switching from -O0 to -Os or -O3), in which case most compilers meet this requirement simply because they're deterministic.

Obsolete C for you and me

Posted Dec 12, 2023 14:18 UTC (Tue) by rrolls (subscriber, #151126) [Link] (4 responses)

> The problem we have, however, is that lots of legacy code Q is not valid according to any version of standard S, but an old version of compiler P happened to turn it into executable R.

I'm not concerned about legacy code that wasn't valid in the first place. If someone's relying on a behavior that's undocumented, or indeed where the documentation says it shouldn't be relied on, there's no need to keep that behavior at all.

> if Q is valid according to standard S, then there are many Rs that have the same behaviour on the real machine as program Q does on standard S's abstract machine. [...] I'd still like compiler P to be permitted to rearrange such that all delay slots do useful work if I compile program Q again.

OK, I wasn't quite specific enough on this note. It had occurred to me that someone might make the optimisation argument but I didn't want to get lost in the details of that. :) What I meant by "executable R" was "an executable that does what it's supposed to according to the requested standard and any relevant compiler flags that affect behavior". So if some compiler provides a way to explicitly ask for a deterministic build then yes, it should always generate the exact same output bit-for-bit, if the same input is given, no matter how many updates to the compiler there have been. However, that's if you explicitly request a deterministic build. If you just specify certain compile flags that permit a range of possible behaviors, then of course the output should be allowed to vary within the permitted behaviors, such as allowing new optimisations.

My key point was that language maintainers should make provisions for people to be able to compile old code with new compiler versions without undue effort, so that you can use a single compiler version (preferably installed systemwide) to compile a wide range of code, rather than having to have lots of different compiler versions installed. The corollary is that any given source code would work on a wide range of compiler versions. (And I would call on library maintainers to do similar, so that projects that depend on them will work on a wide range of library versions.)

Obsolete C for you and me

Posted Dec 12, 2023 14:56 UTC (Tue) by farnz (subscriber, #17727) [Link] (3 responses)

But the issue here is entirely with code that wasn't valid in the first place - legacy code is full of places where people rely on behaviours that are undocumented, or documented as not to be relied upon, but where the compiler that was used at the time did what the programmer expected that construct to do, while a modern compiler does not.

If we were only dealing with code that was perfectly valid according to a documented standard, there'd be a lot less noise on this topic. But we're not - we're dealing with legacy code that's never been valid according to the documented form of the language, but where many previous implementations of the language did what the programmer expected anyway. It's just that current compilers now do something different, thus "breaking" the code (since they're still matching what's documented, just not the same interpretation as the previous compiler and the original programmer).

Obsolete C for you and me

Posted Dec 13, 2023 8:22 UTC (Wed) by rrolls (subscriber, #151126) [Link] (2 responses)

In that case we're in agreement. My issue was with (mostly) modern languages where it's not seen as a problem for totally valid code to break in a couple years' time, and the idea that C might one day go that way too.

Obsolete C for you and me

Posted Dec 13, 2023 11:02 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

The underlying argument is about what defines "totally valid code"; one group (which I think you're part of) says that "totally valid code" is defined solely by reference to some form of standard, and another group that extends that definition to include custom-and-practice.

The second group argue that C compilers don't see it as a problem for "totally valid C" to break in a couple of years time, because they've got a construct that's been interpreted in a specific way by every compiler they've used in the last 50 years, up until the latest GCC or Clang version. This is mostly about a difference in definition; if you define "totally valid C" as "my compilers from 1995 to 2020 accepted this construct without diagnostics and interpreted it consistently", then you're going to view a 2023 compiler interpreting that code differently (with or without a diagnostic) as "new compiler can't handle totally valid code". Whereas the first group would say "this code was never valid, because C90 says it's not valid, but no diagnostic is required, and thus it's on you".

Obsolete C for you and me

Posted Dec 13, 2023 12:36 UTC (Wed) by Wol (subscriber, #4433) [Link]

> Whereas the first group would say "this code was never valid, because C90 says it's not valid, but no diagnostic is required, and thus it's on you".

The problem then comes when the standard declaring the code invalid (C90) postdates the code itself ... :-) (Like a lot of the C code I worked on)

Cheers,
Wol

Obsolete C for you and me

Posted Dec 9, 2023 11:52 UTC (Sat) by ballombe (subscriber, #9523) [Link] (32 responses)

This practice breaks bisection (git bisect) for no good reason.
Suddenly some commits that just missed a cast fail to build, breaking the bisection.

Obsolete C for you and me

Posted Dec 9, 2023 12:04 UTC (Sat) by mb (subscriber, #50428) [Link] (31 responses)

> for no good reason

That's not true. There are good reasons.

Obsolete C for you and me

Posted Dec 9, 2023 13:40 UTC (Sat) by pizza (subscriber, #46) [Link] (30 responses)

> That's not true. There are good reasons.

that should be "not _enough_ good reasons"

This scenario is describing a very real problem that folks with existing codebases have to deal with.

Obsolete C for you and me

Posted Dec 9, 2023 14:23 UTC (Sat) by mb (subscriber, #50428) [Link] (29 responses)

>very real problem

Can you name a project that
- was under active development in the last 5 years so that to-be-bisected bugs have been introduced and
- uses these legacy C features?

I think that projects only fall into three categories:
1) They are unmaintained since decades. No need to bisect.
2) They are maintained and already build with -Wall so that these legacy problems don't exist.
3) A very small group of projects that have very poor code quality and are actively maintained.

No "very real" problem for 1) and 2).
3) should not be used anyway. The fix is to rewrite them.

Obsolete C for you and me

Posted Dec 9, 2023 15:10 UTC (Sat) by makendo (guest, #168314) [Link] (2 responses)

NetHack is known to use legacy function definitions as late as 2021:

boolean
is_edible(obj)
register struct obj *obj;
{
    /* ... */
}

The next revision of C is deprecating legacy function definitions. The development branch has since switched to modern function definitions, but the switch wasn't backported to the 3.6.x releases and Gentoo maintainers have forced -std=gnu89 as a result.

Obsolete C for you and me

Posted Dec 9, 2023 15:30 UTC (Sat) by fw (subscriber, #26023) [Link]

Those definitions were declared obsolescent in the second edition of the standard, in 1999. According to published drafts, the next revision of the standard will remove them from the language altogether. I don't know what compilers will do about it. C23 also introduces unnamed parameters. Curiously, the syntax is not ambiguous even for compilers which still support implicit int, but it is a rather close call.

Obsolete C for you and me

Posted Dec 10, 2023 18:50 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

A few years ago, I distinctly remember NetHack announcing[1] that they were going to "start using ANSI C features," along with a caveat that they had not yet decided which features they were going to use, and they also indicated that anyone who couldn't deal with ANSI C should contact them to discuss the situation. To my understanding, "ANSI" C just means C89.

[1]: https://nethack.org/devel/deprecation.html#361

Obsolete C for you and me

Posted Dec 9, 2023 16:10 UTC (Sat) by pizza (subscriber, #46) [Link] (23 responses)

>- was under active development in the last 5 years so that to-be-bisected bugs have been introduced and uses these legacy C features?

I can't name something that uses these specific legacy C features, but I help maintain one project [1] that is extremely sensitive to the toolchain used [2], making bisecting quite challenging when you have to cross a toolchain boundary and the old toolchain can't even be compiled on more modern systems.

> 1) They are unmaintained since decades. No need to bisect.

You make the same mistake as so many others by equating "unmaintained" with "unused" -- Disabusing folks of this notion is the entire point of this article.

[1] Rockbox, replacement firmware for a wide variety of MP3 players. Currently supporting a couple dozen platforms representing four major CPU architectures. It runs bare-metal, under SDL, and as a native Linux application that has to run on both ancient and modern userspaces.
[2] To the point we supply our own that needs to be built from sources

Obsolete C for you and me

Posted Dec 9, 2023 16:26 UTC (Sat) by mb (subscriber, #50428) [Link] (22 responses)

>You make the same mistake as so many others by equating "unmaintained" with "unused"

No, I didn't say that.
I said that if there are no changes then there is no need to bisect.

Obsolete C for you and me

Posted Dec 9, 2023 16:31 UTC (Sat) by pizza (subscriber, #46) [Link] (21 responses)

You're not bisecting the upstream project, you're bisecting your local project that builds the (possibly modified) upstream project in its own local tree.

So you fix the failures in your tree so you can continue building it with modern toolchians, but when you need to go back and bisect *your own code*, this vendored code no longer builds, forcing you to have to backport those changes at each bisection step.

This sort of thing can be _very_ common.

Obsolete C for you and me

Posted Dec 9, 2023 16:42 UTC (Sat) by mb (subscriber, #50428) [Link] (20 responses)

>this vendored code

It's a ticking time bomb for so many more reasons to vendor or even only depend on unmaintained code.
It should have been priority number one to get rid of it decades ago before it exploded.

Such old code will often blow up in your face when compiled with modern optimizing compilers. Regardless of the proposed changes from the article.

In fact, I would actually *prefer* the build breakage over a subtle "miscompilation" due to decades old code not playing with the rules of the C machine model or having implicit types and declarations.

Obsolete C for you and me

Posted Dec 9, 2023 18:45 UTC (Sat) by pizza (subscriber, #46) [Link] (17 responses)

> It should have been priority number one to get rid of it decades ago before it exploded.

Why? It had been working just fine.

I'm using in production a bit of software that literally hasn't been updated in nearly three decades. Replacing it with anything else woulld require a nontrivial amount of effort, for no measurable gain.

(I had to do a little bit of work to make it compile on 64-bit targets but that's been the extent of its maintenance in the past 20 years)

Obsolete C for you and me

Posted Dec 9, 2023 19:02 UTC (Sat) by mb (subscriber, #50428) [Link] (16 responses)

>Why?

To avoid building up more and more technical debt and to prevent it from exploding.

>It had been working just fine.

Until it exploded the bomb was just fine.

Obsolete C for you and me

Posted Dec 9, 2023 19:13 UTC (Sat) by pizza (subscriber, #46) [Link] (15 responses)

> To avoid building up more and more technical debt and to prevent it from exploding.

Are you volunteering to pay me to do this work?

Or is this just yet another example of someone demanding that I perform unpaid work on their behalf?

Obsolete C for you and me

Posted Dec 9, 2023 19:43 UTC (Sat) by mb (subscriber, #50428) [Link] (14 responses)

> Are you volunteering to pay me to do this work?

Nope. It's your project.

> Or is this just yet another example of someone demanding that I perform unpaid work on their behalf?

No. Not at all. I am not demanding anything.
Feel free to keep piling up as much technical debt as you like.
It's your decision.

But please don't complain, if it explodes.

Obsolete C for you and me

Posted Dec 9, 2023 22:56 UTC (Sat) by pizza (subscriber, #46) [Link] (13 responses)

> Feel free to keep piling up as much technical debt as you like.

What you call "piling up technical debt" everyone else calls "priorities"

> But please don't complain, if it explodes.

Um, I'm not. Once again, you presume something not in evidence.

None of this stuff "explodes" on its own. Indeed, it works just fine in the environments it's been used in for (as you put it) "decades". However, the article, and this discussion, is about how a _new_ environment has come along that causes (usually trivially-fixed) compilation failures on code that's not needed significant maintenance for "decades". How is that property not a _good_ thing? What is this modern fascination with constantly reinventing the wheel just to stay in place?

Obsolete C for you and me

Posted Dec 10, 2023 0:59 UTC (Sun) by marcH (subscriber, #57642) [Link] (12 responses)

> What is this modern fascination with constantly reinventing the wheel just to stay in place?

While that fascination is real, it's absolutely not what this article and discussion is about. You're angry and not listening.

Obsolete C for you and me

Posted Dec 10, 2023 2:39 UTC (Sun) by pizza (subscriber, #46) [Link] (11 responses)

> While that fascination is real, it's absolutely not what this article and discussion is about. You're angry and not listening.

Seriously?

I'm being scolded for using ancient software that does exactly what I need it to do, solely because it's just a matter of time before it "explodes" causing me all manners of problems. Instead, I should switch to something actively developed.

That presumes that there is (1) an alternative with the necessary functionality, and (2) the transition cost is low to nonexistent. It also over exaggerates the scope and effect of the actual problem (ie a compile-time problem that is, most of the time, pretty trivial to resolve).

I've also pointed out, multiple times, that this article shows that "not actively developed" does not mean "not actively used", and the right-now cost of incrementally fixing this old software is far less than replacing it entirely.

(Anectdotally, those calling for wholesale replacements/rewrites/etc or otherwise telling F/OSS authors/maintainers/distributors/users/etc what they "should" be doing never seem to be the ones doing the actual work or helping cover its cost. I won't apologize for calling out that abusive, entitled behaviour)

Obsolete C for you and me

Posted Dec 10, 2023 5:17 UTC (Sun) by marcH (subscriber, #57642) [Link] (1 responses)

You're not being scolded and this article is not about rewrites. Rewrites have not been mentioned in the article and barely mentioned in the comments (which is actually very surprising considering this is C)

Obsolete C for you and me

Posted Dec 10, 2023 15:43 UTC (Sun) by pizza (subscriber, #46) [Link]

> You're not being scolded and this article is not about rewrites.

....You injected yourself into the tail end of a sub-thread that was about just that.

(Meanwhile, I agree that the article _wasn't_ about that, a point I've repeatedly made)

Obsolete C for you and me

Posted Dec 10, 2023 15:58 UTC (Sun) by mb (subscriber, #50428) [Link] (8 responses)

>I'm being scolded

That's not true. It's your choice and I respect that choice.
But you have to live with the consequences of your own decisions.
It was your decision to use C features that have been deprecated and throwing warnings for decades.

>for using ancient software that does exactly what I need it to do, solely because it's just a matter of time >before it "explodes" causing me all manners of problems.

Yes. It's called bit-rot.
Environments change and perfectly good software becomes an ancient mess.

>Instead, I should switch to something actively developed.

That is *one* of the possible options that have been pointed out here.

>"not actively developed" does not mean "not actively used"

Yes. But nobody claimed that it would mean that.

>telling F/OSS authors/maintainers/distributors/users/etc what they "should" be doing

Nobody is telling you what you should do. That's a misinterpretation on your side.
Feel free to keep depending on unmaintained software. That is your choice and I am fine with that.

But I'm not fine with it, if you want to prevent certain developments of the C language itself, just to keep your ancient and trivially fixable code working.

Obsolete C for you and me

Posted Dec 10, 2023 18:03 UTC (Sun) by pizza (subscriber, #46) [Link] (6 responses)

> Feel free to keep depending on unmaintained software. That is your choice and I am fine with that.

I'm willing to bet that, on a daily basis, you trust your physical safety (if not your life) to "unmaintained software".

For example, the _newer_ of my two vehicles was manufactured 22 years ago. Any support/warranty/part supply/recall obligations its manufacturer had completely ceased seven years ago. If the software running in its ECU, ABS, and safety/airbag modules doesn't qualify as "unmaintained" then nothing else possibly could.

Meanwhile, *every* computer I have Linux installed upon is running completely unmaintained firmware -- The newest one fell out of support about a year ago. Does this mean I should just scrap the lot?

My point? "unmaintained" doesn't mean that it's automatically bad, untrustable, or incapable of fulfilling its purpose. Secondly, "maintained" in of itself tells you very little. Indeed, the Fedora folks' efforts with these old packages is itself a form of maintenance!

Going back a few posts, the "unmaintained production" software I mentioned earlier that you chided me for relying upon? It's a glorified data logger in a closed environment. It's been in production for approximately two decades, and it's "unmaintained" because *it hasn't needed any maintenance* in the past three years. It does what's needed, so what is to be gained by messing with it? What exactly is supposed to "explode" in this context? This particular bit of ancient software is actually the most reliable portion of the entire system!

Obsolete C for you and me

Posted Dec 10, 2023 18:45 UTC (Sun) by mb (subscriber, #50428) [Link] (1 responses)

> If the software running in its ECU, ABS, and safety/airbag modules doesn't qualify as "unmaintained"
> then nothing else possibly could.

Well, yes. I know. In my day job I write this software.

Probably the majority of the software in such a thing is "frozen". It will not be developed any further to add new features.
But it's not at all "unmaintained", because if problems do come up, they will get fixed.
This is enforced by law.

>Meanwhile, *every* computer I have Linux installed upon is running completely unmaintained firmware
>Does this mean I should just scrap the lot?

Nope. You completely missed my point again.

I am not at all talking about binary firmware sitting in devices. It's completely fine to keep using the same binary firmware for an infinite amount of time. It will not become worse with time.

I am talking about the legacy source code that is used in new compilations with modern compilers today.
That is a completely different thing.

> What exactly is supposed to "explode" in this context?

If you try to recompile it with a modern compiler many things will happen.
That is what this discussion is about.

Obsolete C for you and me

Posted Dec 10, 2023 19:35 UTC (Sun) by marcH (subscriber, #57642) [Link]

> ... binary firmware sitting in devices. It's completely fine to keep using the same binary firmware for an infinite amount of time. It will not become worse with time.

Only if it's really "airtight" (cause new attack techniques appear constantly) and use cases never ever change.

Even in such a case the company will likely want to re-use and evolve that source in some newer product. Then as you wrote, the binary is fine but the source is not.

"Zero maintenance" software can exist for sure but in many cases people who wish they don't have to pay for maintenance are just burying their head in the sand not to see technical debt.

Software maintenance has absolutely nothing to do with the fascination for shiny new things. It's actually the exact opposite. Confusing the two is not far from insulting the people performing ungrateful maintenance work. Unlike pseudo-"inventors"[*], they're never in the spotlight. Kudos to LWN for this article.

[*] search the Internet for "myth of the lone inventor"

Obsolete C for you and me

Posted Dec 11, 2023 10:51 UTC (Mon) by farnz (subscriber, #17727) [Link] (3 responses)

A twenty-year old binary built with Diab 5.0 either works or it doesn't; that's not going to change just because GCC 13.2 has a more thorough understanding of the C standard than GCC 3.1 (roughly contemporary to Diab 5.0). If you rebuild from the same sources today with Diab 5.0, you'll still get a working binary - nothing has changed, so nothing new fails.

Further, you can do your changes (if any are needed) with Diab 5.0 as the compiler, and it will interpret the code the same way it did 20 years ago. What you face trouble with is code that assumed that some underspecified behaviour would always be implemented the way the compiler of the day implemented it, and even then, only if you change the compiler. If you don't change the binary, it doesn't matter; if you change the source, but reuse the same compiler, it's (usually) fine.

The problem comes in when you change two things at once; both the compiler in use (maybe even as small a change as switching from PowerPC 440 to Arm Cortex-M7 backend in the same compiler binary) and the source code. At that point, you have the risk of a problem that could be anywhere, since most languages don't (arguably can't) tell you if the behaviour of the compiler has changed in a way the last programmer to touch the code would be surprised by. This applies to Rust, too; for example, integer overflow is underspecified in Rust by this standard (two possible outcomes, panic or 2s complement wrapping), and if the last programmer to touch the code didn't think about this, then you have room for a problem where only panic is acceptable behaviour, but instead you get silent wrapping.

Obsolete C for you and me

Posted Dec 11, 2023 17:02 UTC (Mon) by pizza (subscriber, #46) [Link] (2 responses)

> The problem comes in when you change two things at once; both the compiler in use (maybe even as small a change as switching from PowerPC 440 to Arm Cortex-M7 backend in the same compiler binary) and the source code. At that point, you have the risk of a problem that could be anywhere

...I'd argue a switch from a (usually) BE CPU to a (nearly always) LE CPU is a pretty significant change, to say nothing of subtleties like the memory ordering model and how unaligned accesses are handled.

But yes, change out a major portion of the compile or runtime environment (and/or other fundamental requirements) and the code may need updating. Change multiple things at once... you're likely in for a world of pain.

Obsolete C for you and me

Posted Dec 11, 2023 17:33 UTC (Mon) by farnz (subscriber, #17727) [Link] (1 responses)

But then we come back round to the beginning - why are you rebuilding code with a new compiler if no requirements have changed, and expecting it to behave exactly as it did when built with the old compiler? This goes double if your code depends on specifics of how the old compiler interpreted the code, rather than being code whose meaning is unambiguous.

And that, of course, leads to the big problem with legacy code - much of it (in all languages) is written "knowing" that if it passes tests when built with a single compiler, then it's good enough. But change anything (compiler, inputs, other bits and pieces), and it stops working.

Obsolete C for you and me

Posted Dec 11, 2023 19:48 UTC (Mon) by pizza (subscriber, #46) [Link]

> But then we come back round to the beginning - why are you rebuilding code with a new compiler if no requirements have changed, and expecting it to behave exactly as it did when built with the old compiler?

Well, if nothing changes, then.. you don't need to do anything. (That was kinda my point with respect to my using "unmaintained" software in a production environment)

But more typically, requirements do change... eventually. You rarely know what those will be in advance, or what effort will be needed to handle it.

Obsolete C for you and me

Posted Dec 10, 2023 19:20 UTC (Sun) by marcH (subscriber, #57642) [Link]

To be pedantic: "bit-rot" can happen even inside the same repo. It's not unusual to have some unused part of the code not even compiled on a regular basis. Then someone wants to resurrect it or to just increase coverage and surprise: it's not compatible with the rest anymore!

Small digression sorry.

Obsolete C for you and me

Posted Dec 9, 2023 18:52 UTC (Sat) by ballombe (subscriber, #9523) [Link] (1 responses)

> In fact, I would actually *prefer* the build breakage over a subtle "miscompilation" due to decades old code not playing with the rules of the C machine model or having implicit types and declarations.

Which miscompilation are you talking about?
Assignment between different pointer types is about impossible for the compiler to get wrong the compiler, whether or not cast are used.

Obsolete C for you and me

Posted Dec 9, 2023 19:01 UTC (Sat) by mb (subscriber, #50428) [Link]

>Which miscompilation are you talking about?

Well, I said what I was talking about:

>subtle "miscompilation" due to decades old code not playing with the rules of the C machine model or having implicit types and declarations.

Just try to compile a 30-40 year old C program. Chances are good that it just won't work.

Obsolete C for you and me

Posted Dec 9, 2023 16:57 UTC (Sat) by andresfreund (subscriber, #69562) [Link]

> Can you name a project that
> - was under active development in the last 5 years so that to-be-bisected bugs have been introduced and
> - uses these legacy C features?

Postgres used ~two instances of "Missing parameter types in function definitions" until somewhat recently. Mainly because it made the code look worse to replace them.

Obsolete C for you and me

Posted Dec 9, 2023 17:37 UTC (Sat) by ballombe (subscriber, #9523) [Link]

1) They are unmaintained since decades. No need to bisect.

Why ?

2) They are maintained and already build with -Wall so that these legacy problems don't exist.

False: -Wall does not prevent commits that generate a warning to be pushed to a GIT repository.
Especially if the CI system only test the tip of branch, not all the intermediary commits.
In any large code large code base there will always be some small percentage of commit that generate warnings.

Beside, maintainers come and go and making the live of new maintainers miserable by pretending they are responsible for the state of the repository before they took over maintenance do not serve anyone purpose.

Most current C project have their own memory management system (if only to deal with out of memory) which will likely need to do conversion between pointers of different type. It is quite easy to miss a cast (especially when the rules for C++ and C are different).

What does it mean that Vala is not seeing development?

Posted Dec 9, 2023 3:41 UTC (Sat) by ebiederm (subscriber, #35028) [Link] (1 responses)

I took a quick look and Vala had a release about a year ago, and it's development branch has a commit only a week old.

That does not seem inactive to me.

What does it mean that Vala is not seeing development?

Posted Dec 9, 2023 10:31 UTC (Sat) by fw (subscriber, #26023) [Link]

Yes, I didn't mean to imply that the Vala transpiler wasn't maintained in general. It's just that type errors in generated C code have been known (and reported to the Vala developers) for quite some time and have not been addressed, so it's reasonable to assume that fixing them is not straightforward given Vala's implementation model, and unlikely to happen soon.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 4:01 UTC (Sat) by makendo (guest, #168314) [Link] (1 responses)

I definitely welcome these to be turned into errors for C99 and above. Any code still using it should turn on -std=c89 or -ansi in their Makefiles.

My main complaint is that assignments between incompatible pointers shouldn't always error when the target is declared in the same statement (i.e. initialization), as otherwise you would have to specify the pointer type twice in a single statement, which I see as unnecessary verbosity.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 19:18 UTC (Sat) by ianmcc (subscriber, #88379) [Link]

use auto to inter the type from the cast.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 16:50 UTC (Sat) by NightMonkey (subscriber, #23051) [Link] (2 responses)

This is not on topic, exactly, but I sure wish "The C Programming Language" could be updated to reflect modern patterns, yet retain the tone and pedagogical approach of the original. It is such a well-written and well-presented book! Cheers.

Modern C for Fedora (and the world)

Posted Dec 10, 2023 0:27 UTC (Sun) by makendo (guest, #168314) [Link] (1 responses)

I doubt if it would ever receive a third edition for C23, as one of the original authors has died.

Modern C for Fedora (and the world)

Posted Dec 10, 2023 9:05 UTC (Sun) by swilmet (subscriber, #98424) [Link]

We will need to wait that the book falls into the public domain. There will still be C code to maintain when it will happen.

Modern C for Fedora (and the world)

Posted Dec 10, 2023 3:51 UTC (Sun) by david.a.wheeler (subscriber, #72896) [Link] (1 responses)

I proposed discussing these options as recommendations for C and C++ compilation as part of the OpenSSF guidance on compiler options:

https://github.com/ossf/wg-best-practices-os-developers/i...

Modern C for Fedora (and the world)

Posted Dec 10, 2023 3:57 UTC (Sun) by david.a.wheeler (subscriber, #72896) [Link]

For context, here is the link to the current version of the OpenSSF
"Compiler Options Hardening Guide for C and C++":

https://best.openssf.org/Compiler-Hardening-Guides/Compil...

Modern C for Fedora (and the world)

Posted Dec 11, 2023 17:43 UTC (Mon) by eru (subscriber, #2753) [Link]

A nit, but I think the description of the implicit function declaration is a bit off. It is not a function that takes no parameters, but one with an unknown parameter list. It also assumes external linkage. Equivalent to

int f();

Implicit function definition	53
Implicit integer declaration	2
Integer conversion	99
Return mismatch	13
Missing parameter type	0
Pointer assignment	374