|
|
Subscribe / Log in / New account

Modernizing Fedora's C code

By Jake Edge
November 2, 2022

It is not often that you see a Fedora change proposal for a version of the distribution that will not be available for 18 months or so, but that is exactly what was recently posted to the mailing list. The change targets the C source code in the myriad of packages that the distribution ships; it would fix code that uses some ancient compatibility features that were removed by the C99 standard but are still supported by GCC. As might be guessed from the long runway proposed, there is quite a bit of work to do to get there.

As usual with Fedora change proposals, this one was posted to the Fedora devel mailing list on behalf of its owner, Florian Weimer, by Fedora program manager Ben Cotton; it is also available in an updated form on the Fedora wiki. At the moment, Fedora 37 is imminent, but the proposal targets Fedora 40, which is currently slated for the northern-hemisphere Spring of 2024. The goal, as described by the title is "Porting Fedora to Modern C".

Old C

There are several C features that were removed from the language in the C99 standard, but are still accepted by default in GCC, so use of them exists in the extensive code base that makes up Fedora. The idea would be to start working on cleaning those up, hopefully in collaboration with other distributions, and getting those changes into the upstream projects. There are six different constructs that are targeted, but the most important is removing implicit function declarations:

This legacy compatibility feature causes the compiler to automatically supply a declaration of type int function() if function is undeclared, but called as a function. However, the implicit return type of int is incompatible with pointer-returning functions, which can lead to difficult debugging sessions if the compiler warning is missed, as described here.

On 64-bit systems, implicitly returning a 32-bit int can result in a truncated pointer as was described in the linked blog post. Similarly, functions returning _Bool that are called after an implicit int declaration may have their return values misinterpreted because they do not modify all 32-bits of the return register.

Another problem identified is that, before C99, the following code would implicitly define two int variables since the type is not specified:

    static i = 1.0;
    auto j = 2.0;
Detecting and removing those two "implicit-int" features is complicated by the use of Autoconf in many of the packages that are part of Fedora:
Neither change is trivial to implement because introducing errors for these constructs (as required by C99) alters the result of autoconf configure checks. Quite a few such checks use an implicitly declared exit function, for instance. These failures are not really related to the feature under test. If the build system is well written, the build still succeeds, the relevant features are automatically disabled in the test suite and removed from reference ABI lists, and it's not immediately apparent that feature is gone. Therefore, some care is needed that no such alterations happen, and packages need to be ported to C99.

As noted in a message on the LLVM forum site, the Autconf scripts that are included with a project may rely on the implicit declarations and any compiler warnings or errors may get lost because of the way Autoconf is run. Tools are being developed to help find the problem code without upsetting the Autoconf apple cart. Collaboration with other distributions should also help in that process, the proposal said.

The other changes would fix some crufty constructs. The bool, true, and false keywords would be enforced; code that defines its own versions of those symbols would need to change. In addition, GCC does not treat the assignment of pointers to int variables as an error, which needs to be fixed. Old-style function definitions, such as:

    int sum(a, b)
        char *a;
        int b;
    { ...
would need cleaning up as well, the proposal said. And finally, the use of empty parentheses on function declarations needs to be tightened up:
In earlier C versions, a declaration int function() declares function as accepting an unspecified number of arguments of unknown type. This means that both function(1) and function("one") compile, even though they might not work correctly. In a future C standard, int function() will mean that function does not accept any parameters (like in C++, or as if written as int function(void) in current C). Calls that specify parameters will therefore result in compilation errors.

The benefits for Fedora are twofold, according to the proposal. Bugs that are sometimes hard to track down and "look eerily like compiler or ABI bugs" will be avoided because warnings that might be easily overlooked would become errors. In addition, it is believed that some of these legacy features are holding back progress in GCC. The proposal is looking forward to the release of GCC 14, which is expected to be included in Fedora 40. There is a belief that GCC 14 will disable support for those features by default, but even if that does not happen, Fedora 40 would change its defaults so that regressions are not introduced afterward.

Questions

Daniel P. Berrangé noted that Autoconf difficulties immediately sprang to mind when he saw the proposal. He pointed out that other build systems that probe for features by compiling up test programs of various sorts may also be affected. Weimer said that he had already found a problem of that sort in the Python setuptools. It generates test programs to determine whether certain functions are present in the environment, but those programs rely on implicit int, so they may fail with a strict compiler, but not because the function of interest is missing.

Berrangé had also asked about how developers could test their code to see if it has any of these problems; it is not simply setting -std=gnu99 since that will still allow the old constructs (for now at least). Weimer spelled out the list of flags needed ("It's -Werror=implicit-int -Werror=implicit-function-declaration. Best to throw in -Werror=int-conversion -Werror=strict-prototypes -Werror=old-style-definition."); he also updated the proposal on the wiki so that each of the flags was tied to the obsolete feature it would catch.

Alexander Sosedkin wondered why the porting effort couldn't be distributed and proceed at a more leisurely pace. To start, every package could be marked as not supporting the change and the defaults for GCC could be switched for all packages that do support it; nothing would change at that point, but packages could slowly make the changes needed and toggle their flag. Though Weimer wishes it could be done that way, he really does not think that a distributed porting effort is feasible; "we'd have to teach many more people about C arcana and autoconf corner cases. I don't think that's a good learning investment to be honest."

Vit Ondruch asked about the status of the effort and the number of packages that will be affected by it. Weimer said that he was still figuring that out, but his preliminary run found about 10% of the Fedora packages were affected, which he believes is an overestimate. That certainly seems like a daunting number, though, since Fedora has more 60,000 packages in its repository, the bulk of which are written in C.

There were no major complaints heard about the change in the thread. Gentoo is working on a similar set of changes for Clang 16, which already rejects the legacy C constructs; that should cut the work roughly in half, with luck. More distributions joining the party could reduce it further still. The Fedora Engineering Steering Committee (FESCo) will need to consider the proposal to decide whether the distribution should pursue it; as yet, the proposal is still in the discussion phase. It's a lot of work—thankless work, in truth—but would help modernize the code for lots of projects. Fedora would obviously benefit from that modernization as well.



to post comments

Modernizing Fedora's C code

Posted Nov 3, 2022 0:58 UTC (Thu) by sam_c (subscriber, #139836) [Link] (15 responses)

Thanks for shining a light on the ongoing hassle we're facing. I'm working on the effort for Gentoo and have spoken with Florian about his side over in Fedora land. I wrote a bit about the situation for Gentoo at https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082...

So far, in Gentoo, we've come across across several interesting bugs:
- SDL thought no joystick support existed (https://github.com/libsdl-org/SDL/pull/6217)
- zsh stopped installing any extensions and would hang at runtime (https://bugs.gentoo.org/869539)
- apr would install but cause apr-util build to loop forever (https://bugs.gentoo.org/870004)

Ultimately, as noted in the article (and in various talks of late, like Florian's talk at Plumbers in 2021: https://youtu.be/q5itHU2T5xU?t=2579), these issues often caused runtime problems before they became fatal in Clang 16 & possibly the future GCC 14 release. They need to be fixed, but they're still not easy to find.

The super hard part is configure tests silently failing and changing behaviour. I'm playing with diffing `configure` output before/after, while Florian is looking to specifically log errors from the compiler (by patching it) for missing prototypes where we *know* on a given system they're present (and hence a missing include or whatever, not just a deliberately missing function that's BSD-only).

The build failures in a particular package are easy if they're within its source code, they're not so easy if it's because of failed feature detection in either its configure script, or worse, another package's.

Florian and I have started a mailing list (which we haven't posted to yet, but I'd rather plug it now because we really need the help on this) at lists.linux.dev: https://lore.kernel.org/c-std-porting/. Please consider subscribing and helping out if you're comfortable with C.

Modernizing Fedora's C code

Posted Nov 3, 2022 0:59 UTC (Thu) by sam_c (subscriber, #139836) [Link]

Modernizing Fedora's C code

Posted Nov 3, 2022 6:04 UTC (Thu) by xen0n (subscriber, #99406) [Link] (1 responses)

As an interesting side note, the ongoing effort of porting various Linux distros (including Gentoo of course) to the emerging LoongArch architecture would actually help here. Because the *first* LLVM/Clang to work on the architecture would be version 16. We already have caught many gcc-13-related changes in the wild, for that matter, and will surely continue to do so with clang-16.

And I find it so much more rewarding that work on such a niche architecture could benefit the Linux ecosystem as a whole. Keep on compiling!

Modernizing Fedora's C code

Posted Nov 3, 2022 15:46 UTC (Thu) by willy (subscriber, #9762) [Link]

Over twenty years ago, I went through much the same exercise for much the same reason with Itanium & PA-RISC. It was more C++ than C, but not being able to use gcc-2.95 forced us to fix a lot of crusty old code. It was a slog, but I'm glad you find this kind of exercise rewarding!

Modernizing Fedora's C code

Posted Nov 3, 2022 14:03 UTC (Thu) by khim (subscriber, #9252) [Link] (5 responses)

> The super hard part is configure tests silently failing and changing behaviour.

Why is it hard? I have done such things before. They are tedious, sure, but “hard”? How? Why?

Just create a wrapper which would call gcc twice (first with added Werror=implicit-int -Werror=implicit-function-declaration and then normally) and log commands somewhere if one call succeeds, other call fails.

Maybe would be a good idea to also collect error messages and ensure they match, too.

Then look on these logs and tweak source till they are empty.

> The build failures in a particular package are easy if they're within its source code, they're not so easy if it's because of failed feature detection in either its configure script, or worse, another package's.

That's why you use result of second call, the one without -Werror options. Only when you are sure there are no difference in options detection and you are sure there are no problems you turn these options on for real.

Modernizing Fedora's C code

Posted Nov 3, 2022 22:00 UTC (Thu) by sam_c (subscriber, #139836) [Link] (3 responses)

> Why is it hard? I have done such things before. They are tedious, sure, but “hard”? How? Why?

"Hard" in that there's a lot of work, not a lot of people helping with it right now, and not all packages follow a standard pattern. I'm glad you've done it before and I welcome your help in our efforts! Also, there's a lot of packages in the world, and therefore a lot of logs to sift through and actually fix the problems in.

1. Not everything uses `autotools` so you can't just wrap for e.g. configure. Of course, in Gentoo, it's easy enough for us to wrap anything called in our `src_configure` phase. An example of this is the bug in the article: distutils (https://github.com/pypa/setuptools/issues/3648).

2. Older `autotools` doesn't create a config.log. Related to #1, as different build systems behave differently. They may store their results in a different place.

3. autotools and other build systems may generate implicit function declaration errors intentionally when testing for e.g. BSD-only functions. This is noise and makes it harder to identify "real" breakage.

4. Testing every build combination of every package is impossible. In Gentoo, for example, PHP has 82 or so USE flags (different build options). We therefore can't test all combinations ourselves. Making a test for these issues programmatically so that users are warned if their configuration might be broken is possible (and will likely be done), but reducing false positives is important to avoid user confusion.

5. As noted in another comment, the shipped versions of autoconf (including the latest version 2.71) emits declarations which are broken with `-Werror=strict-prototypes`.

6. Related to #4: suppose we missed a problem in a build of a library. A user then reports a problem in a consumer of that library. It's not always going to be obvious that the library was miscompiled, especially if this is several years down the line.

I'm not claiming this is all intractable, but it's also not easy, and it's also not quick.

Modernizing Fedora's C code

Posted Nov 3, 2022 22:28 UTC (Thu) by khim (subscriber, #9252) [Link] (2 responses)

> I'm not claiming this is all intractable, but it's also not easy, and it's also not quick.

It definitely wouldn't be quick and that's why I think it's good idea to do a bit more infrastructure work before starting looking on logs.

#1: I'm not proposing to wrap configure. Not even close. I propose to make wrapper for gcc (and clang if it's used). You either specify it as compiler or, if even that's hard, rename old gcc to something like gcc.real and put wrapper in that place instead.

The only trouble would be the fact that Gentoo wouldn't allow wrapper to just store logs in some random file, but that's not a big deal: you can send them by connecting to 127.0.0.1 and talking to some server, e.g.

#2 and #3 become irrelevant, because you are not even comparing logs. You are just comparing behavior of two gcc calls. And if they lead to the same result (both attempts compiled conftest succefully or both failed) then you don't even need to do anything about that case. But if one succeeded and one failed then you need to look and see what happens there.

#4 is indeed a big problem, but that's Gentoo-specific issue, that I have no idea how to fix. I only needed to ensure that after switch from glibc to musl in some pretty convoluted embedded image builder there would be no changes in behavior (or at least we would know what have changed). It's a bit easier than what you attempted, because there are no 82 use flags.

#5 you would, probably, require some changes in autotools, but is it really that problematic? I guess that's something I haven't actually needed, but without working fix it's hard to even start.

#6 is, again, same problem as #4. You can not ensure there would be not changes in all possible combinations of use flags, but at least you can ensure that tested combinations work fine.

Modernizing Fedora's C code

Posted Nov 3, 2022 22:36 UTC (Thu) by sam_c (subscriber, #139836) [Link] (1 responses)

1. I said "wrap for". The point being, if you want to just find 'configure' or configure-ish failures, and avoid this affecting package builds (which makes things awkward because you then can build things which depend on it), it can be a problem. We do sandbox builds including network but yes, it'll be doable to store in a file somewhere.

2/3. Right, it's a good idea. You indeed don't have to worry about diffing later, you can import if something simply doesn't compile with the second compiler. I'll go play with it.

4. It's really just the nature of the large amount of software out there. Some of these bugs will end up hitting other distros anyway.

5. Yes, it is, because you have to go around running autoreconf in tonnes of packages when the configure scripts in the release tarball are stale (and it's not fixed in an autoconf release, so new releases of software will continue to be broken).

Would it perhaps have been more acceptable to you if I'd said "complex" rather than "hard", as there's many factors? :)

Modernizing Fedora's C code

Posted Nov 4, 2022 15:52 UTC (Fri) by mbunkus (subscriber, #87248) [Link]

What's even worse is that autoconf versions aren't always compatible. You cannot simply take a years-old autoconf.ac source file & process it with a the newest, bug-fixed autoconf release. There are several breaking changes.

Modernizing Fedora's C code

Posted Nov 7, 2022 19:32 UTC (Mon) by fw (subscriber, #26023) [Link]

The wrapper approach still over-reports issues somewhat because -Werror=implicit-function-declaration moves errors from link time to compile time. Many configure-like checks perform just a single gcc invocation even for link checks. But separate invocations will still result in false reports (because the compile stage used to pass).

In practice, there is not that much of a difference to a single-pass approach, logging the errors. A lot of the errors are caused by a missing #include <stdio.h> for exit, and require manual intervention anyway. There are many functions which are expected to be implemented, but we can maintain a list of those.

Modernizing Fedora's C code

Posted Nov 3, 2022 19:49 UTC (Thu) by madscientist (subscriber, #16861) [Link] (5 responses)

It seems like it should be feasible to run configure with the original flags, then re-run it with the strict flags, and simply compare the resulting config.h (or possibly config.status, with some fuzzing) to see that features are not being incorrectly lost.

Modernizing Fedora's C code

Posted Nov 3, 2022 20:20 UTC (Thu) by zaitseff (subscriber, #851) [Link] (3 responses)

I use Autoconf and Automake in my small Star Traders game. My code is written in C99, but when I tried running ./configure with the flags recommended in the article, configure didn't get very far at all:

$ wget -N https://ftp.zap.org.au/pub/trader/unix/trader-7.18.tar.xz
$ tar xvf trader-7.18.tar.xz
$ cd trader-7.18
$ ./configure CFLAGS="-Werror=implicit-int -Werror=implicit-function-declaration -Werror=int-conversion -Werror=strict-prototypes -Werror=old-style-definition"
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a race-free mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C... yes
checking whether gcc accepts -g... yes
checking for gcc option to enable C11 features... unsupported
checking for gcc option to enable C99 features... unsupported
checking for gcc option to enable C89 features... unsupported
checking whether gcc understands -c and -o together... yes
checking whether the compiler is clang... no
checking for compiler option needed when checking for declarations... none
checking whether make supports the include directive... yes (GNU style)
checking dependency style of gcc... gcc3
configure: error: requires an ISO/IEC 9899:1999 (C99) compiler

The problem is that even when configure tries the -std=gnu11 flag for GCC, it still has K&R-style function declarations in conftest.c:

configure:5405: gcc -std=gnu11 -c -Werror=implicit-int -Werror=implicit-function-declaration -Werror=int-conversion -Werror=strict-prototypes -Werror=old-style-definition  conftest.c >&5
conftest.c:26:14: error: function declaration isn't a prototype [-Werror=strict-prototypes]
   26 | static char *e (p, i)
      |              ^
conftest.c: In function 'e':
conftest.c:26:14: error: old-style function definition [-Werror=old-style-definition]
cc1: some warnings being treated as errors

Given that this package is in Fedora (and many other distros), happy to help solve the problem! And yes, I recognise I might be doing something wrong...

Modernizing Fedora's C code

Posted Nov 3, 2022 21:50 UTC (Thu) by sam_c (subscriber, #139836) [Link] (1 responses)

Thank you for your interest!

There's two problems here:
1. I think you might have generated `configure` using an older `autoconf`. But even newer `autoconf` will generate code which isn't strict-prototypes compliant (see below). Try `autoreconf -fi` and see if it's any better. Consider publishing a new release if it is.

2. Something which makes all of this harder to test for is that you can't use `-Werror=strict-prototypes` for `configure` because some tests rely on it failing/succeeding.

https://git.savannah.gnu.org/cgit/autoconf.git/commit/?id... could be applied to your distribution's `autoconf` to make sure it generates decent code.

As a workaround: omit -Werror=strict-prototypes when passing CFLAGS to `configure`. You can try then passing them with make/in the environment after `configure` and hope it gets picked up.

Modernizing Fedora's C code

Posted Nov 4, 2022 20:40 UTC (Fri) by zaitseff (subscriber, #851) [Link]

Thanks for your comments. I guess I won't be the only package maintainer in this situation!

I think you might have generated configure using an older autoconf.

I used the latest versions of autoconf (2.71) and automake (1.16.5) that were in Debian Sid at the time (August 2022). These versions are still the latest as released upstream. So I guess I might have to wait for new versions to be released. I'll also check whether Fedora includes/will include additional patches that Debian does not.

Something which makes all of this harder to test for is that you can't use -Werror=strict-prototypes for configure because some tests rely on it failing/succeeding.

In actual fact, not only can I not use -Werror=strict-prototypes, I also must drop -Werror=old-style-definition. But then my package compiles, installs and actually works! :-)

Modernizing Fedora's C code

Posted Nov 4, 2022 14:25 UTC (Fri) by madscientist (subscriber, #16861) [Link]

Clearly, I meant that once the basic issues were worked out; this may well require a new release of autoconf.

If things outright fail, that's straightforward enough to resolve (obviously assuming sufficient manpower). My understanding is that this wasn't the concerning part. My understanding is that the concern was when a test for some feature fails not because that feature is not available but because the test doesn't compile properly.

This is a kind of "silent error" where the build succeeds but the resulting package is neutered and important functionality is missing, because configure incorrectly thought the feature was not available.

For this it seems like running the configure both ways and comparing the config.h / config.status would let you know if there was such a "silent error".

Modernizing Fedora's C code

Posted Nov 7, 2022 19:38 UTC (Mon) by fw (subscriber, #26023) [Link]

It's not just autoconf. For example, has_function in Python's setuptools appears to be quite broken even for C89 compilers, and it gets worse with the removal of implicit function declarations.

A side effective of this endeavor is that we look at parts of the build process that are usually ignored and taken for granted. For example, I also found a garbled line in configure.ac of liblognorm, causing the clock_gettime check to always fail (even on glibc 2.17), resulting in unnecessary linking with -lrt. The tricky part will be not get sucked in to side projects to address such issues.

Modernizing Fedora's C code

Posted Nov 3, 2022 11:05 UTC (Thu) by mpr22 (subscriber, #60784) [Link] (4 responses)

> Old-style function definitions

Dear God. I think even Nethack, once the great swamp of compatibility cruft for antediluvian K&R 1st edition compilers and steam-driven Unices, has got rid of those now!

Modernizing Fedora's C code

Posted Nov 3, 2022 13:30 UTC (Thu) by epa (subscriber, #39769) [Link] (2 responses)

It's a shame, because I find the old style a bit more readable. It would have been better for ANSI C to start doing the type checking even when the old style was used, rather than forcing a switch to a different syntax to get checking of function arguments. Much too late now of course.

Modernizing Fedora's C code

Posted Nov 3, 2022 15:42 UTC (Thu) by dskoll (subscriber, #1630) [Link] (1 responses)

How could it do type-checking with the old syntax? You need a function prototype to do type-checking, and that can't be written with the old syntax (and making it possible would be a nightmare for the parser.)

Modernizing Fedora's C code

Posted Nov 4, 2022 22:34 UTC (Fri) by floppus (guest, #137245) [Link]

For static functions, the function definition can tell you the parameter types. It's not uncommon in older codebases for static functions to always be defined before they're used, and not to have prototypes (I think K&R C doesn't allow forward declarations of static functions at all.) External functions, of course, usually have prototypes in a header file, guarded by #ifdef __STDC__ or something.

GCC doesn't check argument types against an earlier old-style definition (though it does check old-style parameters against an earlier prototype), and -Wmissing-prototypes doesn't complain about missing prototypes for static functions, either.

Modernizing Fedora's C code

Posted Nov 3, 2022 17:35 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

Modernizing Fedora's C code

Posted Nov 4, 2022 4:02 UTC (Fri) by wezm (subscriber, #139623) [Link]

> which is currently slated for the northern-hemisphere Spring of 2024

Thank you for explicitly naming the hemisphere instead of just assuming northern like so many season based calendar references I see.

Modernizing Fedora's C code

Posted Nov 7, 2022 12:59 UTC (Mon) by eru (subscriber, #2753) [Link] (3 responses)

... is complicated by the use of Autoconf in many of the packages ...

I have long wondered what is the point of Autoconf (and other build tools that do similar autoprobing) these days. The environments and compilers are much more standard now than when Autoconf was born and had to deal with a zoo of twisty unix variants, all different. Now the configure step sometimes takes as much time than the actual compilation while it checks for the presence of features every realistic environment has had for the past 30 years. I would like to just say "I have Linux and GCC, build it".

Modernizing Fedora's C code

Posted Nov 7, 2022 14:02 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (1 responses)

Checking for functions and their signatures doesn't make a lot of sense anymore, but there are still open questions:

- musl or glibc?
- is dep X available? is it a sufficient version?
- is libdl a separate library?

My pet peeve is when I see checks being performed that get put into `config.h` and then…never used. Those patches at least tend to get some traction even if removing unnecessary-but-used checks is not as well-received.

And, not that it's relevant for autoconf, but Windows support does tend to require probing (or embedding tables for things like "the XYZ toolchain finally supports snprintf"). These checks can be dropped as older toolchain support is dropped.

Modernizing Fedora's C code

Posted Nov 8, 2022 13:45 UTC (Tue) by eru (subscriber, #2753) [Link]

Maybe we could standardise something like the tripled defining the environment in GCC, but only with compiler and libc: compiler-vers-libc-vers. "uname" will provide the architecture and OS on POSIX-compatible hosts.

Existence of external dependencies could be pre-checked with pkg-config and/or the host's package manager. If the package is not found, warn, but barge on regardless, in case it was installed without such metadata. (The build would then fail if it does not exist).

Modernizing Fedora's C code

Posted Nov 7, 2022 14:11 UTC (Mon) by farnz (subscriber, #17727) [Link]

A big part of it is that nobody uses site defaults to set up a site-wide cache file for configure to use, so nobody's configure scripts are set up to cope with a site-wide cache, and it all thus falls apart.

In theory, a Linux distribution could take responsibility for this, but the mess that results from autoconf allowing you to override linkers, compilers etc via environment variables makes it problematic.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds