|
|
Subscribe / Log in / New account

CPython, C standards, and IEEE 754

By Jake Edge
March 2, 2022

Perhaps February was "compiler modernization" month. The Linux kernel recently decided to move to the C11 standard for its code; Python has just undergone a similar process for determining which flavor of C to use for building its CPython reference implementation. A calculation in the CPython interpreter went awry when built with a pre-release version of the upcoming GCC 12; that regression led down a path that ended up with the adoption of C11 for CPython as well.

A bug that was fixed in early February started the ball rolling for Python. Victor Stinner encountered a GCC regression that caused CPython not to get the expected IEEE 754 floating-point NaN (not a number) value in a calculation. An LWN article sheds some light on NaNs (and how they are used in Python) for those who need a bit more background. The calculation was using the HUGE_VAL constant, which is defined as an ISO C constant with a value of positive infinity; the code set the value of the internal Py_NAN constant used by the interpreter to HUGE_VAL*0, which should, indeed, evaluate to a NaN. Multiplying infinity by any number is defined to be a NaN for IEEE 754.

During his investigation of the problem, Stinner found that instead of the calculation, Python could simply use the NAN constant defined in <math.h>—as long as a C99 version of the header file was used. As part of the bug discussion, Petr Viktorin said that PEP 7 ("Style Guide for C Code") should be updated to reflect the need for the C99 header file. So Stinner duly created a pull request for a change to the PEP, but Guido van Rossum said that a change of that nature should be discussed on the python-dev mailing list.

That led Stinner to post a message to discuss the change on February 7. As it turns out, there are actually two bugs fixed by Stinner that require parts of the C99 math API; bug 45440 reported a problem with the CPython Py_IS_INFINITY() macro; the fix for that also involved using the C99 <math.h>. As Stinner noted, C99 is now 23 years old, and support for it in compilers is widespread; GCC, Clang, and Microsoft Visual C (MSVC) all support the needed features.

Floating point

Mark Dickinson pointed out that the existence of the NAN constant is not required by C99 directly; it is only present if IEEE 754 floating point is enabled as well. He thought that it made sense for CPython to require IEEE 754, but wondered whether Python, the language, should also require it. Stinner said that all modern computers support IEEE 754; even embedded devices without a floating-point unit (FPU) typically support it in software. "Nowadays, outside museums, it's hard to find computers which don't implement IEEE 754."

Stinner was in favor of requiring IEEE 754 for CPython; Gregory P. Smith agreed. Brett Cannon wondered if there was even any ability to test with systems that lacked the support:

Do we have a buildbot that has a CPU or OS that can't handle IEEE-754? What are the chances we will get one? If the answers are "none" and "slim", then it seems reasonable to require NaN and IEEE-754.

Stinner reported that all of the buildbot machines supported IEEE 754, so the path was clear to require it. In terms of the Python language, Christopher Barker said that IEEE 754 support should not be required for all implementations of Python, but that it should be recommended. Steve Dower agreed that leaving it up to Python implementations made sense: "Otherwise, we would prevent _by specification_ using Python as a scripting language for things where floats may not even be relevant." He said that making it a requirement would inhibit adoption: "The more 'it's-only-Python-if-it-has-X' restrictions we have, the less appealing we become."

Which C?

Switching to C99 makes sense if the compilers being used to build CPython support it, Cannon said. Viktorin asked about MSVC's support for all of C99; he did not find any documentation saying that it did, so it might be better to consider C11, which is supported. "Do we need to support a subset like 'C99 except the features that were removed in C11'?" Dower, who works on Python at Microsoft, said that he had not found an answer to the C99 question either:

All the C99 library is supposedly supported, but there are (big?) gaps in the compiler support. Possibly these are features that were removed in C11? I don't know what is on that list.

[...] Personally, I see no reason to "require" C99 as a whole. We have a style guide already, and can list the additional compiler features that we allow along with guidance on updating existing code and ensuring compatibility.

I don't see much risk requiring the C99 standard library, though. It's the compiler features that seem to have less coverage.

Stinner suggested tying the wording to what was supported in MSVC, but H. Vetinari thought a better formulation might be "'C99 without the things that became optional in C11', or perhaps 'C11 without optional features'". That led Viktorin to wonder why C11 would not make a better target: "[...] the main thing keeping us from C99 is MSVC support, and since that compiler apparently skipped C99, should we skip it as well?"

Cannon said that he found a list of optional C11 features, none of which were really needed; if the "C11 without optional features" flavor is widely supported, as it would seem that it is, "I think that's a fine target to have". Meanwhile, both Inada Naoki and Viktorin were excited about using C11's anonymous union feature in CPython.

Viktorin also said that in order to keep the CPython public header files compatible with C++, anonymous unions could not be used in them, though Inada said that C++ does support them, "with some reasonable limitations". While CPython aims to be compatible with C++ at the API level, it is hard to completely specify what that means—or even which version of the C standard is supported—as Smith pointed out:

We're likely overspecifying in any document we create about what we require because the only definition any of us are actually capable of making for what we require is "does it compile with this compiler on this platform? If yes, then we appear to support it. can we guarantee that? only with buildbots or other CI [continuous integration]" - We're generally not versed in specific language standards (aside from compiler folks, who is?), and compilers don't comply strictly with all the shapes of those anyways for either practical or hysterical reasons. So no matter what we claim to aspire to, reality is always murkier. A document about requirements is primarily useful to give guidance to what we expect to be aligned with and what is or isn't allowed to be used in new code. Our code itself always has the final say.

The final result was a rather small patch to PEP 7 to say that CPython 3.11 and beyond use C11 without the optional features (and that the public C API should be compatible with C++). In addition, bug 46656 and a February 25 post from Stinner document the changes to the floating-point requirements; interestingly, they do not mention IEEE 754, just a requirement for a floating-point NaN. While it may have seemed like a bit of a yak-shaving exercise along the way, the GCC regression eventually led to a better understanding of which flavor of C is supported for building CPython—along with moving to a flavor from this century. All in all, a good "days" work.


Index entries for this article
PythonCPython
PythonFloating point


to post comments

CPython, C standards, and IEEE 754

Posted Mar 2, 2022 23:52 UTC (Wed) by developer122 (guest, #152928) [Link] (5 responses)

In practical terms, I don't think it's possible to build python without IEEE 754 anyway. Someone tried it on a VAX a few years ago, and even though they were willing to wait months for it to compile the total lack of IEEE 754 made it completely impossible. It was stated that the only way this was going to work was if someone implemented software-emulated IEEE 754 for the VAX architecture and then ported python/the compiler to use it.

CPython, C standards, and IEEE 754

Posted Mar 3, 2022 2:42 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (4 responses)

It is certainly impossible to build CPython without IEEE 754, as described in the article, but Python is the language, not the implementation. The article is (basically) saying that, if someone really wants to go to the obscene lengths required to completely re-implement Python from scratch in a no-floats environment, the language spec should not explicitly forbid such a thing. OTOH, nobody ever promised that was going to be *easy.*

CPython, C standards, and IEEE 754

Posted Mar 3, 2022 5:24 UTC (Thu) by developer122 (guest, #152928) [Link] (1 responses)

I'm talking in practical terms. There is no implementation of python today (at least among the ones tested) that will compile and run on a VAX.

CPython, C standards, and IEEE 754

Posted Mar 5, 2022 17:46 UTC (Sat) by wtarreau (subscriber, #51152) [Link]

If I were certain it wouldn't take gigs of disk and hundreds of megs of RAM, I'd try on my old VAX with its 12 MB RAM just to see :-) But most likely gcc-2.95 will not build recent versions anyway.

CPython, C standards, and IEEE 754

Posted Mar 3, 2022 9:26 UTC (Thu) by taladar (subscriber, #68407) [Link] (1 responses)

Why not? If all hardware out there today supports IEEE 754 (something that can be said for standards on very few other topics) what does Python gain from keeping their own language standard ambiguous on this matter?

CPython, C standards, and IEEE 754

Posted Mar 3, 2022 9:42 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

The argument is, essentially, because that's how C works. C does not specify how floats behave, but every reasonable implementation on the planet follows IEEE 754 unless you pass a flag such as -ffast-math (or it's running on a VAX or something). Technically, if you try to compile CPython with -ffast-math, it'll probably produce an entirely functional binary, but the resulting Python interpreter might not strictly follow IEEE 754, depending on what optimizations the C compiler was able to make. But that's a nonstarter if you want to *officially* support non-754 floats in CPython, because -ffast-math does not (strictly) obey any standard whatsoever, so there's nothing you can reasonably test it against (i.e. you can't make a -ffast-math test suite as you can for IEEE 754). All you can really do is put a line in the documentation saying "here there be dragons."

CPython, C standards, and IEEE 754

Posted Mar 3, 2022 11:28 UTC (Thu) by ballombe (subscriber, #9523) [Link]

Note that there is a difference between hardware ieee754 support and software ieee754 support.
There are ARM processors without hardware ieee754, but with software ieee754 support, but of course the performance are not the same.
Even VAX could in theory have software ieee754 support.

CPython, C standards, and IEEE 754

Posted Mar 9, 2022 23:52 UTC (Wed) by bartoc (guest, #124262) [Link]

C11 anon union support has been supported for a long time via "ms-extensions" on all three major compilers. That flag turns on just the ms-extensions that were probably good ideas in hindsight, not the really horrible ones. There's a gcc only variant called plan9-extensions that's even cooler, and you can absolutely see the seeds of some Go features in that extension.

Sidenote: while msvc is pretty unlikely to ever support them I actually don't think VLAs are that horrible of a feature. Compilers just have to emit stack probes! which they were all really bad about doing initially (and some even still don't with default settings!). IMO the standard should have required them to emit them unconditionally (unless the compiler can prove the size is actually < 1 page, for sure). I should write a paper, tbh.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds