|
|
Log in / Subscribe / Register

The troubles with Boolean inversion in Python

By Jake Edge
February 27, 2026

The Python bitwise-inversion (or complement) operator, "~", behaves pretty much as expected when it is applied to integers—it toggles every bit, from one to zero and vice versa. It might be expected that applying the operator to a non-integer, a bool for example, would raise a TypeError, but, because the bool type is really an int in disguise, the complement operator is allowed, at least for now. For nearly 15 years (and perhaps longer), there have been discussions about the oddity of that behavior and whether it should be changed. Eventually, that resulted in the "feature" being deprecated, producing a warning, with removal slated for Python 3.16 (due October 2027). That has led to some reconsideration and the deprecation may itself be deprecated.

The problem was reported in 2011 by Matt Joiner who was surprised by the outcome of some tests that he ran:

    >>> bool(~True)
    True
    >>> bool(~False)
    True
    >>> bool(~~False)
    False
    >>> ~True, ~~True, ~False, ~~False
    (-2, 1, -1, 0)
That last example demonstrates how those unexpected results came about: True is effectively just an alias for one and False is zero. When those values are inverted, they do not really act in a Boolean kind of way. In Python, any non-zero value is treated as true in a Boolean sense, and the complement of one is -2, both of which evaluate to true. Python defines its integers as using two's complement representation.

History

The bool type, True, and False were not added to the language until Python 2.3 in 2002, though the feature was infamously backported to the 2.2.1 bug-fix release prior to 2.3. PEP 285 ("Adding a bool type") described the feature in some detail; it is clear that using an integer value was done purposefully, for backward compatibility, at least in part. The PEP abstract explains:

The bool type would be a straightforward subtype (in C) of the int type, and the values False and True would behave like 0 and 1 in most respects (for example, False==0 and True==1 would be true) [...]

The author of the PEP, Guido van Rossum, was the Python benevolent dictator for life (BDFL) at the time; the Review section of the PEP kind of foreshadows the problems that led him to step down from that role 16 years later:

I've collected enough feedback to last me a lifetime, so I declare the review period officially OVER. I had Chinese food today; my fortune cookie said "Strong and bitter words indicate a weak cause." It reminded me of some of the posts against this PEP... :-)

The PEP was silent about applying the complement operator to bool values, but the implementation allowed it. Joiner filed the bug in 2011 because he went looking for a C-like unary not operator ("!"), which is not present in the language, and ran into "~" instead. As Amaury Forgeot d'Arc pointed out, the logical not operator is what Joiner was seeking. The bug was closed the day after it was opened, because the behavior was deliberate.

But the problematic behavior popped up again in a 2019 bug report from Tomer Vromen, who noted that the bitwise and ("&") and or ("|") operators acted as expected (i.e. like the logical equivalents), while complement does not. In fact, the bitwise versions of the and/or operators returned a bool result, while "~True" returns an int -2 (and not True as the integer could be interpreted, or even False as the caller might expect). The bug report linked to a fairly lengthy python-ideas thread from 2016 that also discussed the problem. Both the bug and the thread noted that NumPy has a Boolean type that behaves as expected (at least by some) and returns False for "~numpy.bool_(True)".

In the thread, Van Rossum seemed to lean toward changing the behavior, but wanted to do it with a quick change for Python 3.6, skipping a deprecation cycle, or not at all. Python behavior seems fairly inconsistent, as he described:

To be more precise, there are some "arithmetic" operations (+, -, *, /, **) and they all treat bools as ints and always return ints; there are also some "bitwise" operations (&, |, ^, ~) and they should all treat bools as bools and return a bool. Currently the only exception to this idea is that ~ returns an int, so the proposal is to fix that.

More recently

The idea seems to have just died out in 2016, and again in 2019, but was resurrected by Tim Hoffmann in a 2022 comment on the 2019 bug report. He proposed that ~ be deprecated for the bool type, which Van Rossum endorsed, suggesting that the deprecation be added for the then-upcoming 3.12 release. Earlier, Van Rossum clearly did not want to change the type of the result of ~bool to be a bool:

Because bool is embedded in int, it's okay to return a bool value that compares equal to the int from the corresponding int operation. Code that accepts ints and is passed bools will continue to work. But if we were to make ~b return not b, that makes bool not embedded in int (for the sake of numeric operations).

Take for example

    def f(a: int) -> int:
        return ~a
I don't think it's a good idea to make f(0) != f(False).

In 2022, though, he was in favor of deprecating the use of the complement operator on bool values, rather than switching to a bool return type for complement. In the discussions about the behavior over the years, the main downside to it is that it can be confusing to users and that there is seemingly no real use case for it. For those who do end up getting confused, it is clearly not the right tool for the job, but the fact that NumPy and other libraries have normalized using bitwise complement to mean not muddies the waters.

The deprecation warning was duly added to Python 3.12 in 2023 from a pull request from Hoffmann. It gives a lengthy explanation when the exception is raised:

DeprecationWarning: Bitwise inversion '~' on bool is deprecated and will be removed in Python 3.16. This returns the bitwise inversion of the underlying int object and is usually not what you expect from negating a bool. Use the 'not' operator for boolean negation or ~int(x) if you really want the bitwise inversion of the underlying int.

One of the problems with deprecations is the visibility of the warnings; at various points, the DeprecationWarning exception was hidden by default because it too often was only seen by end users who were unable to fix the underlying problem. That changed back in 2017 to increase the visibility of the warnings, in part so that users could request fixes from library developers—deprecation in Python pops up fairly frequently in discussions about development of the language.

In August 2024, though, Barry Warsaw saw a GitHub email notification about the deprecation, which surprised him because he could not remember a wider discussion about it. He posted to the core development category to have that discussion, but he also wanted to talk about changes like this that can sometimes fly under the radar, so he started a parallel discussion as well. The question of "change visibility" seemed to reach a consensus that there was a problem in need of addressing, but there was less clarity on what might be done. Too much bureaucracy, in the form of PEPs or a more formalized change-management process, may negatively impact contributions, which largely come from volunteers; too little can lead to surprises like the deprecation of ~bool.

On the question of whether it should be deprecated at all, no real consensus was found, which has been the case throughout its history; some were strongly pro-deprecation because it is confusing and generally a footgun, while others lamented the inconsistency of only disallowing bitwise complement for the bool type and allowing all of the other arithmetic and bitwise operators.

Oscar Benjamin noted that "use of ~ for logical negation is widespread" in NumPy and SymPy. Antoine Pitrou pointed out that is because ~ can be overridden, unlike the logical not. Benjamin agreed, saying that PEP 335 ("Overloadable Boolean Operators") would have allowed NumPy and SymPy to take a different path, but it was eventually rejected in 2012. Both Benjamin and Pitrou did not think ~bool was particularly useful and were in favor of deprecation.

On the flipside, Bjorn Martinsson provided some examples of how he uses ~ on Boolean values. They are probably kind of obscure, but he has even publicized a use of the technique. A few others popped up in the thread with use cases as well.

Hoffmann summarized the arguments that led him to propose the deprecation and author the code change to effect it. Since he believed it made sense to rid the language of this footgun, only two paths presented themselves: changing the behavior to a logical negation or deprecating and eventually removing ~bool. He saw no good migration path for switching to negation, though, so he opted for deprecation. The discussion continued on for another month or so before winding down without any firm conclusion. There was talk of a PEP, but that did not come about either.

The thread sparked up again in October 2025 and Hoffmann responded to a query about the PEP, pointing to the bug discussion and his summary earlier in the thread. At that time, Tim Peters also posted about a change that he had to make to his code because of the deprecation; he thought it was far too late in the history of the language to be making breaking changes of that sort:

All computer languages have quirks. Python is, IMO, too mature and widely used now to risk changing much of any visible behaviors, short of screaming bugs, or (but less compellingly so) accidents of implementation that were never documented as "advertised" behavior.

There's nothing surprising about ~bool to people who learn the language. bool is a subclass of int in Python, period. I don't give a hoot how it works in other languages. The time for that kind of argument was when Python's semantics were first crafted. It's too late for that now.

The present

Things went quiet again until mid-February 2026, when Hayden Welch posted a concern, but had misinterpreted what was being deprecated. It led to more discussion, naturally, much of it between Hoffmann and Peters, along with a reminder from Stefan Pochmann about his use case. That caused Hoffmann to start a parallel thread to gather real-world impacts of the deprecation, which currently just has a link to Pochmann's use case and a brief mention of the deprecation (or, really, someday elimination) of ~bool being a violation of the Liskov substitution principle (which had also come up elsewhere in the discussions). Essentially, if bool is to be a subtype of int, it has to be able to be used wherever an int can be and ~ surely qualifies.

In the main thread, though, Van Rossum said that the discussion made him cry. "The inconsistency of disallowing ~x when x is a bool while allowing it when x is an int trumps the lack of a use case here." That was, of course, a complete reversal of his position back in 2023, and also different from his 2016 advocacy of a quick switch to a Boolean result for ~bool. In another message, he confirmed the reversal:

Right, I've changed my mind. Or maybe I wasn't thinking far enough ahead at the time.

I would be okay if ~b where b is statically typed as bool might trigger a warning in linters or static type checkers.

Around the time of Van Rossum's change of heart, the thread seems to have picked back up, at least for a bit. In response to Peters's argument that people mistakenly using ~ for logical not are terribly confused, "H. Vetinari" claimed that they were not, since NumPy and the like have popularized the idea, but "that it only works for arrays". Peters was strongly convinced that the NumPy model would not be good for Python as whole to follow, however. For one thing, it works on more than just arrays, "but the conceptual model is baffling". He provided a number of examples showing how NumPy is internally inconsistent in its handling of its bool type.

Everything Python does follows from that bool is a subclass of int. That's all you have to remember. numpy's bool stands as unique in its type system, and is not even "a numeric type" there - although various operations' special cases make it act like one in various ad hoc ways.

It's simply incoherent, a grab-bag of special cases. The core language shouldn't budge the width of an electron to try to cater to any such stuff.

Matthew Barnett raised the seeming oddity of bitwise & and | returning a bool result, while ~ does not; that was inconsistent in his eyes, as it was in plenty of others' along the way. James Dow largely or completely demolished that argument with extensive references to the language documentation. The language reference pretty clearly shows that the existing behavior is required; an implementation is not actually Python without allowing ~bool. Since bool is an int, the bitwise and/or operators are consistent as well: "True | False must return an integer with a value of 1 (which True is) and True & False must return an integer with a value of 0 (which False is)." Tom Fryers also had a lengthy explanation that showed why the Liskov substitution principle matters, and that real breakage results from deprecating the ~bool operation, even though that operation is perhaps weird and unlikely.

Hoffmann seems amenable to reversing course on the deprecation. In the abstract, that should be easy enough to do; code that changed due to the warning will continue to function just fine if the warning goes away. It is not entirely clear how a decision like that would be made, but one guesses the steering council will be brought in at some point to make a pronouncement. There is no huge rush, at least until the time comes to turn the warning into an exception, which is a year or more off at this point.

Overall, the mood seems to be shifting away from deprecation. Using inversion on a bool is a bit of a dark corner of the language, for sure, and it may have been a mistake not to create a separate Boolean type, certainly some in the discussions believe so. The confusion comes to those who think the language does have a separate Boolean type, and it would be nice to find a way to warn them, but removing the feature altogether seems like a step too far.

The long journey for ~bool is probably not over, but perhaps some kind of ending will come before long. This episode demonstrates a number of aspects of the Python development process over the years, from its more freewheeling days 20 or more years ago through its more stodgy aspect these days. Throughout, we see the general cordiality and collegial nature of its discussions; one suspects we have not seen the last of this odd corner of the language, but that further discussion or development will proceed along the same genial lines. Both the language and the community are rather mature at this point—and it shows.


Index entries for this article
PythonBoolean
PythonDeprecation


to post comments

A different angle

Posted Feb 27, 2026 16:50 UTC (Fri) by nowster (subscriber, #67) [Link] (12 responses)

A more consistent fix would be for True to evaluate to an int representation with all bits set (ie. -1 in two's complement), and False to an int representation with all bits cleared (ie. zero).

A different angle

Posted Feb 27, 2026 16:57 UTC (Fri) by daroc (editor, #160859) [Link] (5 responses)

This is (as I suspect you may know) what Forth does. Unfortunately, making True no longer represented by 1 would be a larger breaking change in Python.

A different angle

Posted Mar 1, 2026 10:10 UTC (Sun) by LtWorf (subscriber, #124958) [Link] (3 responses)

Why? bool(1) would still evaluate as true.

Anyway working with typedload, I kinda hate that bool and int are the same thing. It means that giving load(value, int | bool) is kind of meaningless because bool is an int so if value was "True" you might get a 1 or a "True", depending on how the union gets randomly sorted at runtime.

I think there's no fix for that other than documenting "don't do that".

A different angle

Posted Mar 9, 2026 8:58 UTC (Mon) by IkeTo (subscriber, #2122) [Link] (1 responses)

Like: you were able to index to a length 2 array using bools, changing it to -1... you still can, but it becomes a very obscured operation. Change the array to a length 3 array and you get a changed behavior.

You were able to count the number of True elements in a collection by summing them up. Changing it to -1... you still can, but you need an extra negation at the end. Code becomes buggy until you do, and code that needs to work in both versions needs something like sum(bool_collection) * (-1 ** (sys.version_info >= (3, x, y))) for some appropriate x and y.

True == 1 is not unique in Python. In C and C++ you have that too, and you have the same ~true is interpreted as true. I agree with Peter here: Python is too matured for such changes.

A different angle

Posted Mar 9, 2026 21:46 UTC (Mon) by LtWorf (subscriber, #124958) [Link]

Python breaks backwards compatibility at every single release. I think for java there would be the argument of not breaking backwards compatibility, but in python's case, if it's not a consideration for all the other things that get constantly broken, why worry about it now?

A different angle

Posted Mar 12, 2026 17:52 UTC (Thu) by callegar (guest, #16148) [Link]

Consider that one not uncommon way to express piecewise function is by expressions like `(x<5)*f(x)+(x>=5)*g(x)`... Suddendly changing True to -1 would break that.

A different angle

Posted Mar 1, 2026 19:10 UTC (Sun) by hholzgra (subscriber, #11737) [Link]

Have been bitten by how MS BASIC dialects, up to VBA in 1990s MS Access, did it this way.

We were mygrating an old home grown Access application to Oracle as database and a PHP web frontent.

The "database guy" was not really up to the job, and used some conversion tool that converted MS Access / Jet Engine "bool" to Oracle "VARCHAR2(1)" (don't ask, please ...)

That lead to false being "0", and "true" being just "-" ... and by the time we discovered that it was already to late in the project to redo the whole data conversion, or even just an

UPDATE table SET flag='1' WHERE flag='-';

The "fun" of "Enterprise" projects ...

A different angle

Posted Feb 27, 2026 17:18 UTC (Fri) by ballombe (subscriber, #9523) [Link] (1 responses)

Amusingly, In Microsoft Basic 1.0, True was -1, but there was no bitwise-not operator.

A different angle

Posted Mar 3, 2026 14:39 UTC (Tue) by stevie-oh (subscriber, #130795) [Link]

This is incorrect; in BASIC, the NOT operator is bitwise. (In fact, all of those operators -- AND, OR, XOR, etc. -- were bitwise.)

That is why TRUE in BASIC was -1: because FALSE = 0, and TRUE = NOT FALSE.

I admit that BASIC as a language was not formally standardized, so each different BASIC implementation had its own idiosyncrasies.

Fortunately, archive.org has a working copy of Microsoft Basic 1.0 (for the Macintosh, oddly enough) here: https://archive.org/details/mac_MSBASIC_1

Running the command `PRINT(NOT 5)` will output -6.

A different angle

Posted Feb 27, 2026 18:45 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Visual Basic and COM/OLE on Windows used this convention. So 'true' was -1, making bitwise and logical negations give equal results.

A different angle

Posted Feb 28, 2026 4:05 UTC (Sat) by mirabilos (subscriber, #84359) [Link] (2 responses)

Or bool into an integer with the width of 1 bit, instead of 32 or so.

A different angle

Posted Feb 28, 2026 14:18 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

That's what PL/1 did - at least the version I used. Caused fun when mixing FORTRAN and PL/1 code (or derived code), because PL/1 would use a 16-bit word to store 16 booleans from left to right, and of course FORTRAN, using 0 and 1, would store the significant bit on the right ...

Not a problem so long as you knew what you were dealing with, and in a strongly typed language (I include FORTRAN in that as everything had to be typed, even if it didn't have to be declared), not that hard to deal with either.

Cheers,
Wol

A different angle

Posted Mar 2, 2026 13:55 UTC (Mon) by geert (subscriber, #98403) [Link]

Of course, IBM numbers bits from left (MSB) to right (LSB).

The unbearable clunkiness of Python

Posted Feb 27, 2026 19:00 UTC (Fri) by fman (subscriber, #121579) [Link]

Is thus (re)confirmed ..

Just make it a warning!

Posted Feb 27, 2026 19:09 UTC (Fri) by ringerc (subscriber, #3071) [Link]

The behavior is surprising and has little practical use, but removing it is worse.

So make it a warning, visible by default. Not a deprecation warning. A plain warning that this may not do what you expect it to do. Emitted once per execution.

Allow files to import a magic module (like __future) to suppress it in their code, if they intentionally use the behavior.

another different angle

Posted Feb 27, 2026 19:45 UTC (Fri) by amh131 (subscriber, #41314) [Link] (8 responses)

Here’s a probably foolish idea - if Python bool is a subset of ints then why not define them as the two equivalence classes of int mod 2? 0 and 1 are certainly representatives of the classes but so are many others. And bit wise inversion works. So does addition and multiplication which might be less welcome though.

another different angle

Posted Feb 27, 2026 20:34 UTC (Fri) by dskoll (subscriber, #1630) [Link]

So even numbers are False and odd numbers are True? Are we going to have an Obfuscated Python Code Contest at some point?

another different angle

Posted Feb 27, 2026 22:58 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (5 responses)

The problem is that Python currently provides True == 1 and False == 0. If we apply the Liskov substitution principle, True + True == 2 under current Python. If we change that to True + True == False, we must have False == 2 to uphold LSP. But then we have 0 == False == 2, and transitivity of equality is broken.

another different angle

Posted Feb 27, 2026 23:40 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

Or, to put it another way: An equivalence class of integers is not the same thing as an integer, and it's too late to make such a dramatic change to the bool type now.

another different angle

Posted Feb 28, 2026 2:14 UTC (Sat) by amh131 (subscriber, #41314) [Link] (2 responses)

It is indeed very late for such an idea. I’m not sure the transitive thing (LSP) is such a barrier though. If 0 == False and nonzero == True then True == -5 and True == +5 so we can still get True + True == 0 == False. We don’t need negative values either since we can use overflow! Our finite integer set is a kind of equivalence class all by itself.

Anyways, more of a mental exercise in consistency than a serious proposal for change.

another different angle

Posted Feb 28, 2026 2:43 UTC (Sat) by iabervon (subscriber, #722) [Link]

But True != 5 and True != -5. -5 and 5 are "truthy" (lead to the consequent rather than the alternative in "if" statements), but not equal to True. Also, Python ints don't overflow; they automatically go into larger-than-word values if they get too big, and you eventually run out of memory rather than getting an answer that is equivalent to a different integer.

another different angle

Posted Mar 3, 2026 21:36 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

> If 0 == False and nonzero == True then True == -5 and True == +5 so we can still get True + True == 0 == False.

Which True is 5 and which True is -5? Are you proposing multiple True values? Is True just a thin facade over an odd int?

If so, what is the value of bool(4)? Currently, it's True, but your proposal seems to suggest that it should construct a False instance whose hidden int value is 4.

> We don’t need negative values either since we can use overflow!

Not in Python you can't. Python ints are bignums. They never overflow.

another different angle

Posted Mar 2, 2026 6:47 UTC (Mon) by pdewacht (subscriber, #47633) [Link]

And summing bools is perfectly reasonable in Python. E.g. sum(x==0 for x in list) to found the number of zeroes in a list.

another different angle

Posted Feb 27, 2026 23:58 UTC (Fri) by willy (subscriber, #9762) [Link]

I think you're close, but you're using the wrong bit. Let's say the sign bit is the determining factor -- >= 0 is False, <0 is True. CPUs already make it cheap to test the sign bit. Now ~ works correctly. Adding True to True remains negative until you do it two billion times.

Interesting Weirdness

Posted Feb 27, 2026 20:40 UTC (Fri) by linuxrocks123 (subscriber, #34648) [Link] (1 responses)

I have never run into this, because I know to use `not` rather than `~`. They're doing the right thing here by not breaking people's stuff, so they deserve kudos for that.

If they'd learned that "don't break people's stuff" lesson many years ago, maybe Python versions after 2.7 wouldn't be dead to me ;)

Interesting Weirdness

Posted Mar 1, 2026 10:22 UTC (Sun) by LtWorf (subscriber, #124958) [Link]

Going with all the breaking changes in python 3 was still much better than breaking 1 or 2 things at every release like they do now, so instead of a big transition it's a constant work of fixing stuff that they broke.

Bool funnyness in other languages

Posted Feb 27, 2026 21:59 UTC (Fri) by joib (subscriber, #8541) [Link] (21 responses)

E.g. in C if one does type punning of _Bool variables funny things can happen if one ends up with a trap representation. For a recent story of someone stepping on this particular landmine https://blog.svgames.pl/article/the-little-bool-of-doom

Bool funnyness in other languages

Posted Mar 2, 2026 10:28 UTC (Mon) by PeeWee (subscriber, #175777) [Link] (20 responses)

Well, he asked for it, I guess. Should have #define'd the source as C89 and called it a day.

On a more general note, I think both the Python issue at large and this particular C episode show that it's ill-advised to mix boolean math with arithmetic. That has always been icky, because nobody ever said that true == 1 and false == 0, for instance, which makes reliance on it a prime example of undefined behavior; negative logic does exist. Not being a C whiz, I nonetheless suspect that the memset (sprtemp,-1, sizeof(sprtemp)) is the real culprit in the mentioned example.

Bool funnyness in other languages

Posted Mar 2, 2026 11:52 UTC (Mon) by khim (subscriber, #9252) [Link] (19 responses)

< because nobody ever said that true == 1 and false == 0, for instance

WDYM? It's quite literally part of the standard!

C23 turns true and false into keywords, but they still retain numeric values…

Bool funnyness in other languages

Posted Mar 2, 2026 18:37 UTC (Mon) by PeeWee (subscriber, #175777) [Link] (18 responses)

If you have to look behind that curtain to know that, it is implementation defined, at best. There's a reason for having the symbolic names for those values. They abstract away the exact implementation. Otherwise, why have those macros to begin with? We are not supposed to know that; it's encapsulation.

Bool funnyness in other languages

Posted Mar 2, 2026 20:38 UTC (Mon) by joib (subscriber, #8541) [Link] (16 responses)

It's not implementation defined, it's specified in the standard that true==1 and false==0.

Now, one could argue that it was a mistake to come up with the concept of trap representations, which IIRC apply only to boolean variables whose integer contents are neither 0 nor 1 (only possible via type punning of some sort). It would have been cleaner to e.g. say that 0==false and any non-zero value is true, while the "canonical" value for true is still 1. Yes, it would have meant that some minute optimizations wouldn't be possible, but would have solved issues such as the blog post I quoted above where memset'ing leads to a trap representation and undefined behavior.

Trap representations for C bool

Posted Mar 3, 2026 11:56 UTC (Tue) by farnz (subscriber, #17727) [Link] (10 responses)

Trap representations are the manifestation of the battle between "always does something unsurprising" and "do the fast thing". Without trap representations, the compiler can't assume that a boolean that's "true" has value 1 - it could be any non-zero value - and thus has to optimize in the knowledge that (e.g.) 2 could be "true".

If a boolean could be "true" when the value is not 1, then it becomes difficult to do things like exclusive-or in machine registers; 2 XOR 4 is supposed to be "false" (since both 2 and 3 are "true"), but the machine instruction XOR on a register will give you 6, which is "true". By allowing the compiler to assume that it's always 0 or 1, and nothing else, the compiler can safely output a simple XOR, instead of having to add instructions to coerce non-zero to 1 first if it can't prove, via data flow analysis, that the coercion has already happened. And, of course, if it does add extra instructions, people get upset that it's not doing the "obviously" correct thing - why is it outputting CMP R0, #0 ; MOVNE R0, #1 ; CMP R1, #0; MOVNE R1, #1; XOR R2, R0, R1 when XOR R2, R0, R1 is "obviously correct"?

On the other side, you get surprises like in the blog post, where things that worked when you used char or int stop working as expected with _Bool - these cases would have been surprising if you used XOR, but you never did, so the problem never showed up. And then, when you do switch from int to _Bool, you get surprised by the UB that trap representations cause.

Trap representations for C bool

Posted Mar 3, 2026 13:56 UTC (Tue) by joib (subscriber, #8541) [Link] (9 responses)

Yes, that's what I meant by "minute optimizations". In particular, one somewhat common thing compilers do is implementing "!some_Bool_variable" without requiring a branch as "XOR some_Bool_variable 1". That obviously works only if some_Bool_variable is either 0 or 1.

Is that minute optimization worth introducing a whole new class of errors, aka trap representations, and attendant UB?

Trap representations for C bool

Posted Mar 3, 2026 14:37 UTC (Tue) by taladar (subscriber, #68407) [Link]

I would look at it from the other side. Is the ability to unsafely coerce some random memory location to bool worth giving up all those well defined states that allow these optimizations but also just make everything a lot more deterministic from the point on wards where you once do the check/create the value correctly?

Trap representations for C bool

Posted Mar 3, 2026 15:12 UTC (Tue) by farnz (subscriber, #17727) [Link] (4 responses)

Well, if you want to avoid the trap representations, you can continue to use char, int or similar as your boolean type. However, if you do that, you definitionally do not benefit from optimizations that assume that the value can only be "true" or "false", since the conversion is defined as "0 is false, all other values are true".

C has already decided, long ago, that it's worth having UB that allows for minute optimizations rather than avoiding classes of error. That ship sailed when C declared it UB to dereference a null pointer - which is also only there to enable a minute optimization of removing a load instruction.

More generally, most optimizations are minute; arguing on the basis that an optimization is minute is arguing against optimization in general. There's a handful of big optimizations - dead code elimination, strength reduction, invariant code motion, code + data merging - that are enabled by a large number of minute optimizations like constant folding.

Trap representations for C bool

Posted Mar 3, 2026 15:34 UTC (Tue) by khim (subscriber, #9252) [Link] (3 responses)

> That ship sailed when C declared it UB to dereference a null pointer - which is also only there to enable a minute optimization of removing a load instruction.

Absolutely not. What kind of optimization can you pull from the fact that A nonempty source file does not end in a new-line character. ends in new-line character immediately preceded by a backslash character, or ends in a partial preprocessing token or comment is UB or from the fact that an unmatched ’ or ” character is encountered on a logical source line during tokenization is UB?

While attempts of guys like Victor Yodaiken or Anton Ertl to invent the magic wand that would “make compilers behave” are laughable (language like C which both allows optimizations and permits one to include arbitrary machine code simply have to have an unlimited UB, thanks to Rice's theorem) the attempts to say that the insane list of very subtle UBs that C and C++ have accumulated over time are there solely to facilitate an optimizations is also laughable, for some UBs are most definitely there not for that.

Rather that it's obvious that we have an unfortunate clash of opinions of two groups:

  1. People who are developing the compilers know that unlimited UBs are 100% unavoidable and thus perceive that all hundreds UB that are there, in the documentation, exist solely to help them to facilitate these optimizations.
  2. People who are not a compiler writers couldn't even imagine that optimizations can introduce so much crazyness, so much brokenness and naïvely assumed that “sure compilers may introduce something strange things when they would encounter UB, but, obviously, these strange issues would be easily caught and fixed, thus we can simply mark all the problematic cases where different compilers do radically different things as UB.”

The current sorry state of C and C++ is very much an explosion caused by that misunderstanding between two groups.

And the situation with bool is, probably, not an exception: chances are high that people who have postulated that bool may only include true or false values which map to 1 and 0, correspondingly, have never imagined that they are permitting compiler writers to do… till compiler writers have started doing these things — and then it's very hard to take their words back without looking like an idiots.

Trap representations for C bool

Posted Mar 3, 2026 16:17 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

You appear to be arguing against a strawman of your own creation.

I said that once C declared that it was UB to dereference a null pointer, which by itself only enables some "minute optimizations", it becomes unreasonable to then rail against later changes to C solely on the grounds that they "only enable some minute optimizations".

You, instead, appear to have gone down the route of "some UB in C is insane, therefore dismissing an objection on the basis of 'minute optimizations' as not relevant to C is saying that the insanity of C UB is A-OK".

This is, however, the inherent tension between "C is a high performance language" and "C has no surprising UB"; the C standards committee has consistently chosen "surprising UB but high performance" as the answer to that question, in part because whenever questions get hard, the standards committee's favourite answer has been "surprising UB".

Trap representations for C bool

Posted Mar 3, 2026 19:59 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

> You appear to be arguing against a strawman of your own creation.

And you are doing the same.

> I said that once C declared that it was UB to dereference a null pointer, which by itself only enables some "minute optimizations", it becomes unreasonable to then rail against later changes to C solely on the grounds that they "only enable some minute optimizations".

Or it could be “there are some strange corner cases with that on some implementations thus it's better to declare that behavior undefined” kind of decision. We do know some UBs had precisely that motivation thus it's unclear why you declare that “dereferencing of null pointer is UB” was added specifically for optimizations.

> You, instead, appear to have gone down the route of "some UB in C is insane, therefore dismissing an objection on the basis of 'minute optimizations' as not relevant to C is saying that the insanity of C UB is A-OK".

Nope. They are not “insane”. They “very peculiar and clearly are not added to enable extra “minute optimizations”, but are added for some other reason”. And existence of such optimizations clearly show us that implications of [ab]use of some UBs by optimizing compilers were not considered when these optimizations were added to the text of standard.

> the C standards committee has consistently chosen "surprising UB but high performance" as the answer to that question

Do you have any evidence of that? Any whatsoever in any time by any standard?

We do know that some parts of the C standard were actively pushed by the compiler developers (at least from Ritchie's objection to noalias) but we also know that some of them were added to ensure that some existing compilers and environments were doing “something strange” — without any assumption on what that UB would be used for, just because they had some trouble agreeing on what compilers should do, in that case.

Certainly we had very loud outcry when GCC started using signed overflow for optimizations — there are ample evidence that such optimizations were not expected by significant number of people involved.

And we may only say that making dereferencing null pointer was named UB to facilitate optimizations if you would show us compiler from year 1988 or 1989 that has such an optimization pass. Do you have any, in your mind?

> in part because whenever questions get hard, the standards committee's favourite answer has been "surprising UB".

Which would criminal negligence in a world where any and all optimizations could be used by compilers to create “nasal daemons”. But in the world where the implementor may augment the language by providing a definition of the officially undefined behavior this makes perfect sense. And yes, that sentence is very much part of an official rationale.

We know (not “guess”, but know) that expectation was the total opposite from what have actually happened: the idea was that standard would include lots of UBs, some of them “sane”, some of them “insane” but then actual compilers would pick the set of “sane” UBs that both compiler writers and developers may live with… and, eventually, standard would remove some of these UBs.

I seriously doubt anyone could have even imagined the world where compiler developers would [ab]use any and all UBs that standard lists and then [try to] add more to the list! That was definitely not planned… and yet you assert that it was explicit and conscious choice… do you have any evidence for your position?

In fact that “reading error” that Yodaiken tries to use as a justification for demand for the “benign C compiler” was there because no one (or, maybe, very-very few) could even envision compilers that would do things like “use detection of null dereference for optimizations” when original standard was written. We know that because original standard actually tried to place restrictions on what UB can do: permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Yes, it's not possible to create “benign C compiler” by simply placing a restriction on what UB may lead to… but very few people realized that when these words were written, or else they would have picked different language.

“Unrestricted UB” was clearly either not envisioned by anyone when that was written or, maybe, these few who can even imagine it paid little attention to the idea. And “null pointer dereference is UB” was added back then, most likely without any attempts to facilitate optimizations, just as a “natural continuation” in the list of what one shouldn't be able to dereference (an invalid array reference, null pointer reference. or reference to an object declared with automatic storage duration in a terminated block).

Trap representations for C bool

Posted Mar 4, 2026 9:31 UTC (Wed) by farnz (subscriber, #17727) [Link]

I didn't say that dereferencing nullptr was made UB to facilitate optimizations - I said that if you're making the argument that UB that enables "minute optimizations" is problematic, then you have to also argue that the idea that "dereferencing nullptr is UB" is also problematic, and the C standards committee, empirically, is not willing to entertain that argument.

The rest of your posts is a diatribe about how the compiler authors haven't done what a chunk of the C standards committee expected - but the C standards committee has the power to turn UB into unspecified non-deterministic or implementation-defined behaviour if they so choose. Empirically, they don't so choose, and indeed, when adding things like C23 bool to the Standard, they choose to keep UB that was present in C99's _Bool, and that was added in C99 (so pre-1999 compilers are a red herring).

Trap representations for C bool

Posted Mar 3, 2026 17:39 UTC (Tue) by mb (subscriber, #50428) [Link] (1 responses)

This whole new class of errors is not introduced by making bools only 0 or 1 and declaring >=2 as UB.
It is introduced by the lack of enforced conversion checking at int -> bool transmutation time.

Creating a bool >=2 is UB. But the C language is bad at helping the programmer to correctly enforce this.
Papering over this problem by making every bool operation more expensive is a very bad idea.

Trap representations for C bool

Posted Mar 5, 2026 12:32 UTC (Thu) by hmh (subscriber, #3838) [Link]

I was a bit curious about it and checked just how much gcc and clang would try to help you on this (in -O3 -Wall mode).

Both will implicitly convert any scalar type T to _Bool (i.e. do (bool) = (bool)!!(T) when type punning is not in effect. However, neither compiler is especially tracking it, and thus they will not warn you of the potentially dangerous situation[1].

I did validate that type punning into _Bool will easily cause invalid trap representations without warnings, and often it will also cause unexpected generated code and/or runtime behavior when the optimizer hits the UB. (_Bool) = *((_Bool *)(void *)(& char)) will not throw a warning, and you will quite often get broken code out of it due to the effects of this UB in both gcc and clang optimizers in my testing.

You will also get bad code generation due to the UB when type-punning through an union with int and bool members on gcc 15. I could not get clang (checked clang 10 and 22) to screw this one up: it defanged the UB through an implicit (bool)!!()), but I didn't try very hard.

[1] there exists https://clang.llvm.org/extra/clang-tidy/checks/readabilit...

Trap representations for C bool

Posted Mar 3, 2026 22:03 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

You don't need branches even if foo can take on arbitrary values. SETcc has been a thing since i386.

Besides, 80% of all instances of !foo are inside of a branching construct anyway, so it should often be a "simple" case of swapping out one branch instruction for another, using a different invocation of test, etc., or in other words, free.

Bool funnyness in other languages

Posted Mar 3, 2026 14:06 UTC (Tue) by khim (subscriber, #9252) [Link]

> IIRC apply only to boolean variables whose integer contents are neither 0 nor 1 (only possible via type punning of some sort)

Not on Itanic. There most types may trap if your program includes access to uninitialized varible.

> Yes, it would have meant that some minute optimizations wouldn't be possible, but would have solved issues such as the blog post I quoted above where memset'ing leads to a trap representation and undefined behavior.

Well… when they were introducing these things it wasn't an option and today it's probably too late to change the standard.

Bool funnyness in other languages

Posted Mar 4, 2026 0:59 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (3 responses)

> It's not implementation defined, it's specified in the standard that true==1 and false==0.

Unfortunately, as of C23, it is not. I just checked.

I'm fairly sure they *meant* to specify this, because otherwise several lines in the standard make no logical sense, but there is a difference between meaning to do something and actually doing it.

To make sure we're clear about which exact version of the standard I'm looking at, see https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf

Now for the relevant excerpts:

Section 6.2.5:

> [...]
> An object declared as type bool is large enough to store the values false and true.
> [...]
> The type bool and the unsigned integer types that correspond to the standard signed integer types are the standard unsigned integer types

So far so good. Bool exists, it has two values, those values are true and false, and it is considered an unsigned integer type.

Section 6.2.6.1:

> The representations of all types are unspecified except as stated in this subclause.
> [...]
> Values stored in non-bit-field objects of any other object type are represented using n×CHAR_BIT bits, where n is the size of an object of that type, in bytes. An object that has the value may be copied into an object of type unsigned char [n] (e.g. by memcpy); the resulting set of bytes is called the object representation of the value. Values stored in bit-fields consist of m bits, where m is the size specified for the bit-field. The object representation is the set of m bits the bit-field comprises in the addressable storage unit holding it. Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations.

Bool has an object representation which may be inspected by casting it to unsigned char, just like every other reasonable type in C.

Section 6.2.6.2:

> For unsigned integer types the bits of the object representation shall be divided into two groups: value bits and padding bits. [...] The values of any padding bits are unspecified. [...] The type bool shall have one value bit and (sizeof(bool)*CHAR_BIT)- 1 padding bits. Otherwise, there is no requirement to have any padding bits; unsigned char shall not have any padding bits.

There is one value bit and the rest are padding bits, but as far as I can tell, we never specify that the value bit must be the least significant bit, so the compiler can put it wherever it likes. Of course, the ABI will probably nail it down, but portable programs should not assume a specific ABI.

> [...] For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.

You can memset(&foo, 0, sizeof(foo)) any integer foo, including bool, to set the value of foo to "zero." But we specified that bool can only hold the values true and false, not one and zero, so that's clearly just wrong. There's no special case for bool, here or in several other places where there probably should be.

Section 6.3.1.3:

> When a value with integer type is converted to another integer type other than bool, if the value can be represented by the new type, it is unchanged.

But how are we supposed to do that when converting *from* bool? True and false are not integers (in the mathematical sense of the word "integer"). There is a (brief) section about how to handle the case of converting into bool, but it reads as if the committee got bored and forgot to specify the reverse conversion.

Of course, all of the above would make perfect sense if false == 0 and true == 1, but I wasn't able to find a line in the standard that actually says that. Maybe they shoved it in a secondary annex or something, I dunno, but it isn't in any of the obvious "here are all of the basic types and how they work" places.

Bool funnyness in other languages

Posted Mar 4, 2026 1:28 UTC (Wed) by mussell (subscriber, #170320) [Link] (1 responses)

In the N1570 draft for C11 where bool was still _Bool, they do in fact define true == 1 and false == 0. More specifically, in section 6.2.5, they have
An object declared as type _Bool is large enough to store the values 0 and 1.
C23 changes this to say result is false/true instead of 0/1 respectively. The same change was made to section 6.3.1.2:
When any scalar value is converted to _Bool, the result is 0 if the value compares equal to 0; otherwise, the result is 1.
Section 7.18 which defines the stdbool.h header states
The remaining three macros are suitable for use in #if preprocessing directives. They are

true

which expands to the integer constant 1,

false

which expands to the integer constant 0, and

_ _bool_true_false_are_defined

which expands to the integer constant 1.
The C23 draft only defines the last macro.

Bool funnyness in other languages

Posted Mar 4, 2026 6:18 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

So if I'm interpreting this correctly, they *used* to have this right, and then they broke it? Why???

Bool funnyness in other languages

Posted Mar 4, 2026 12:00 UTC (Wed) by jem (subscriber, #24231) [Link]

>Of course, all of the above would make perfect sense if false == 0 and true == 1, but I wasn't able to find a line in the standard that actually says that.

On page 66, 6.4.4.6 Predefined constants:

"The keywords false and true are constants of type bool with a value of 0 for false and 1 for true."

Bool funnyness in other languages

Posted Mar 3, 2026 14:00 UTC (Tue) by khim (subscriber, #9252) [Link]

< Otherwise, why have those macros to begin with?

For compatibility with C++, obviously.

< We are not supposed to know that; it's encapsulation.

If “we are not supposed to know that”, then why do they specify it in the standard?

Booleans are just weird

Posted Mar 1, 2026 11:42 UTC (Sun) by fw (subscriber, #26023) [Link]

The ~ behavior is not the only one that is accidentally wrong: mathematical implication => is has to be written as <=, so the array points the wrong way.

Linters

Posted Mar 1, 2026 16:56 UTC (Sun) by marcH (subscriber, #57642) [Link]

> if ~b where b is statically typed as bool might trigger a warning in linters or static type checkers.

Is it not already the case? Really should be. I'm surprised this was the only reference to linters.

Because they are optional and _run at a different time_, linters offer a very nice, "softer" alternative to warnings in the language itself. They are also much more flexible, I mean they can also be fine-tuned much better than warnings in the language itself.

So:
- On one hand, demanding people and projects can use types and linters and are warned. Stackoverflow and chatbots become full of advice not to do ~b. Documentation overkill.
- On the other hand, "cow-boys" who don't do types, linters and don't read documentation can keep doing ~b without being bothered at all.

Best of both worlds, everyone is happy. Problem solved!

Finding actual bugs with linters (as opposed to merely checking code style) can rightfully be considered as a "bandaid". Example: memory safety in C/C++. That should really be built in the language and compilers themselves and not left to external checkers; it's too serious! </rust troll>

But other times, a bandaid is the best tool for the job. This looks like one of those tricky cases.

Maybe warn if ~bool is used as bool.

Posted Mar 2, 2026 3:05 UTC (Mon) by gmatht (subscriber, #58961) [Link]

Something like `b=True; if (~b): print ("~b")` serves no purpose since ~bool is always truthy. At the risk of extra complexity perhaps it should warn about ~b if (1) the result is used as a bool (2) ~ is not overridden; and (3) b is known to be always a bool. Using ~b as a shortcut for True might really be taking golfing too far (and also not far enough since 1 is shorter).

They say PHP is wired ...

Posted Mar 9, 2026 14:49 UTC (Mon) by euneuber (subscriber, #8291) [Link]

... but it got boolean right :-)


Copyright © 2026, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds