|
|
Log in / Subscribe / Register

Cro: Maintain it With Zig

Cro: Maintain it With Zig

Posted Sep 10, 2021 0:36 UTC (Fri) by HelloWorld (guest, #56129)
Parent article: Cro: Maintain it With Zig

The article claims that C++ is moving forward too slow. Well, that's just ridiculous. In fact it's hard to think of any language that has changed faster and more radically than C++ over the last 10 years. There are now variadic templates, auto, uniform initialization, lambdas, modules, concepts, rvalue references, constexpr, consteval, simplified for loops, coroutines, enum classes, class template argument deduction, initializer lists and a bunch of others... And there's no sign of this slowing down any time soon.


to post comments

Cro: Maintain it With Zig

Posted Sep 10, 2021 5:35 UTC (Fri) by tlamp (subscriber, #108540) [Link] (16 responses)

He talks about cruft and your statement seem to underline his worries a bit, as this sounds that the cruft-adding/time rate is not too small.

Not saying that adding new features is bad, nor do I have in-depth C++ experience to actually judge its ecosystem, that's just how I read it and from the outside it seems that each C++ versions gets many features bolted on, great to have that much power; but I could imagine that it may need quite some discipline and all the more refactoring, as some contributors surely want to use all that new shiny stuff, but normally one also wants to avoid creating a big ball of mud.

Cro: Maintain it With Zig

Posted Sep 10, 2021 11:12 UTC (Fri) by excors (subscriber, #95769) [Link] (15 responses)

I think a lot of the recent C++ changes aren't about making it more powerful, they're making it easier to use the power that the language already had. Like you can now write "std::lock_guard lock(some_mutex);" instead of "std::lock_guard<std::mutex> lock(some_mutex);" (thanks to class template argument deduction) - not a big change, but it makes the code a bit cleaner. Or more substantially, with features like "if constexpr" you can do metaprogramming (i.e. code that's executed at compile-time, and its inputs/outputs can be both values and types) in a procedural style that's quite similar to regular C++, whereas previously you had to write everything in a weird recursive functional style with horrible SFINAE tricks. And there's a lot of language cleanups so that code which always seemed natural to write in C++ but previously generated obscure compiler errors, now compiles correctly and does what you'd expect.

The language specification is getting more complicated, but programs written in the language can now be simpler and less crufty, which seems like a good tradeoff.

Cro: Maintain it With Zig

Posted Sep 10, 2021 15:47 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link] (14 responses)

> The language specification is getting more complicated, but programs written in the language can now be simpler and less crufty, which seems like a good tradeoff.

The language itself gains multiple ways of doing things, old old way, old way, new way and since with a large amount of users, you can't drop compatibility, any programmer is likely going to have to learn all the different methods. This is something organizations have tried to tackle by limiting themselves to a subset but that subset is different depending on which codebase you are looking at. With a long history, I am not sure that is an easy problem to solve.

Cro: Maintain it With Zig

Posted Sep 11, 2021 0:34 UTC (Sat) by roc (subscriber, #30627) [Link] (13 responses)

It's even worse than that. When you import code from one project to another the C++ subsets used are likely to be different.

Cro: Maintain it With Zig

Posted Sep 12, 2021 0:13 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (12 responses)

Here's (C++ committee member) Nicolai Josuttis on the conflicting C++ language style guides:

https://www.youtube.com/watch?v=WRQ1xqYBKgc

Particularly egregious is the fact that the "Core Guidelines" recommend against East Const, on the basis that even though clearly East Const is better, and could be automatically checked, it's not popular and so you should just learn the more complicated unintuitive West Const rules instead. It's not clear what the purpose of such a guide even is when it defers to popularity so easily.

Cro: Maintain it With Zig

Posted Sep 12, 2021 2:28 UTC (Sun) by HelloWorld (guest, #56129) [Link] (9 responses)

Frankly that talk only demonstrates that many style guides are simply retarded.

A particularly idiotic example is the rule from MISRA that every switch statement must have a default branch. The compiler can and will check exhaustiveness when switching over an “enum class” type, so the rule doesn't achieve anything useful here. But when you add a new enumerator to the enum class, your compiler will now no longer be able to warn you about a missed case, because those are already handled by the default branch! So this rule actively harms programmers by depriving them of a useful language feature.

And when it comes to MISRA, that's just the tip of the iceberg.

switch

Posted Sep 12, 2021 8:08 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (2 responses)

Still a language mis-feature :D

In Rust, match (the closest feature to switch) must be exhaustive, you may choose to either write a default case *or* cover every possible value but not both as that's an error, for the same reason matching '7' twice in some digit matching code would be an error.

If the library you're using knows they might want to add more values to the enumeration they can declare it to be #[non_exhaustive] which signals to the compiler that the former scenario (cover every case explicitly) isn't enough after all and you must always supply a default. This way when the next library upgrade adds another value your default match covers that. USFederalHoliday should likely be #[non_exhaustive] but you don't need a default case for CalendarMonth or DayOfWeek.

If they choose not to write #[non_exhaustive] but then they do add a value to the enumeration anyway this is a backwards incompatible change and your code won't compile until it's adjusted to cope with the new value.

As a result the desired effect of the MISRA rule is always in place in Rust, while the dangerous behaviour is not possible, a guideline is unnecessary.

switch

Posted Sep 12, 2021 11:23 UTC (Sun) by HelloWorld (guest, #56129) [Link] (1 responses)

In safety critical applications, which after all typically run in some sort of embedded system, I don't think you even need something like #[non_exhaustive]. When a new enumerator is added, you really should take another look and not just hope that your old default clause still makes sense. Binary compatibility is not that much of an issue in embedded systems because you don't usually upgrade shared libraries independently.

Besides, this is purely a tooling problem. If MISRA wants to enforce exhaustiveness, they can do so. But apparently their tool vendors are just too lazy and they prefer forcing people to write dead code instead.

switch

Posted Sep 20, 2021 10:42 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

Exhaustiveness in C is really hard because the enumerated type is just a funny way to spell an integer as explained in the other sub-thread.

Obviously rustc turns your simply enumerated type into an integer in the machine code too, but this happens in an IR after you can't touch it, so the only rule needed to avoid setting yourself on fire is "No unsafe code" ie write #![forbid(unsafe_code)] and you're done.

Cro: Maintain it With Zig

Posted Sep 12, 2021 11:34 UTC (Sun) by excors (subscriber, #95769) [Link] (5 responses)

> The compiler can and will check exhaustiveness when switching over an “enum class” type, so the rule doesn't achieve anything useful here.

I think that's incorrect, because it's legal to cast an integer to an "enum class" type even if it's not one of the declared enumerators. Then it wouldn't match any of your 'exhaustive' cases and you need to handle it with a default case (or intentionally rely on the default default behaviour of falling off the bottom of the switch).

(This surprised me when I discovered it recently.)

Specifically, according to C++17 every enum has an 'underlying type'. For "enum E : T {}" and "enum class E : T {}", it is 'fixed' as T. For "enum class E {}", it is fixed as int. The 'values of the enumeration' are the values of the underlying type, e.g. for "enum class E {}" it's all values of type int.

For "enum E {}", the underlying type is not fixed and is implementation-defined. The values of the enumeration are (basically) from 0 up to the smallest 2^N-1 that will fit all the defined enumerators, which may be a smaller range than the underlying type.

With a fixed underlying type, casting an integral value to the enumeration type will convert it to the underlying type first, by the usual integer rules. That means it will always be one of the values of the enumeration, so the cast is always allowed.

With a non-fixed underlying type, casting is only allowed if the integral value is within the range of the enumeration values. E.g. if you have "enum E { one=1, six=6 };" then the range is 0 to 2^N-1 with N=3, so (E)2 and (E)7 are permitted but (E)8 is undefined behaviour. Clang's UndefinedBehaviorSanitizer helpfully detects that: "runtime error: load of value 8, which is not a valid value for type 'E'".

(That restriction is specifically for casting - the standard says "It does not preclude an expression of enumeration type from having a value that falls outside this range". I guess something like "std::underlying_type_t<E> n = 8; E e; memcpy(&e, &n, sizeof(e));" might be a legal way to generate such a value, but I'm not familiar enough with the rules to be certain.)

So I think about the only situation where you can exhaustively switch on an enum without a default case, is when it's "enum E : uint8_t" / "enum class E : uint8_t" and you define enumerators for every value from 0 to 255. In all other cases, for both "enum" and "enum class", it's perfectly legal to have values of the enumeration type that are not one of the enumerators. You need to either do some global analysis of your program (which is outside the scope of C++) to prove you never generate such values, or write code that's locally safe by handling the default case in every switch.

This does make the compiler's "warning: enumeration value '...' not handled in switch" warnings quite silly, because if you forget to handle one enumerator but have a default case you won't get that warning, and if you remove the default case (to enable the warning) and handle every enumerator (to fix the bug revealed by that warning) then the warning goes away even though you've just added billions of unhandled enumeration values. In the latter case, at least GCC will sometimes still warn you that "control reaches end of non-void function" despite you handling every declared enumator - Clang suppresses that warning and silently generates code that will trigger undefined behaviour at runtime when given a valid enumeration value that doesn't match any of the supposedly-exhaustive cases.

Cro: Maintain it With Zig

Posted Sep 12, 2021 14:45 UTC (Sun) by HelloWorld (guest, #56129) [Link] (4 responses)

> I think that's incorrect, because it's legal to cast an integer to an "enum class" type even if it's not one of the declared enumerators.
The problem here is the cast, not the lack of a default clause. Why doesn't MISRA forbid that? That would actually make sense...

Besides, what useful thing could you possibly do in such a default clause? Because after all, the whole point of an enum type is that it can only hold one of a number of enumerated values. Therefore, when you encounter a value that isn't among them, your program is already in a state that the developers didn't forsee, and hence couldn't possibly know how to rectify.

Cro: Maintain it With Zig

Posted Sep 12, 2021 17:06 UTC (Sun) by excors (subscriber, #95769) [Link] (3 responses)

> The problem here is the cast, not the lack of a default clause. Why doesn't MISRA forbid that? That would actually make sense...

Hmm, it looks like MISRA C++:2008 already forbids that: "Rule 7-2-1: An expression with enum underlying type shall only have values corresponding to the enumerators of the enumeration". (That's based on C++03 and scoped enumerations are a C++11 feature, so it's talking about unscoped enumerations here.)

In that case, I think it would be feasible to have exhaustive switches over enums. But since it's different to the standard C++ rules, you'd need to suppress the compiler's "control reaches end of non-void function" warnings (and suppressing warnings seems generally dodgy when you care about safety), then add a static analysis tool to check the new rules. I don't know much about MISRA but I guess they didn't want to rely on tools that didn't exist yet.

> Because after all, the whole point of an enum type is that it can only hold one of a number of enumerated values. Therefore, when you encounter a value that isn't among them, your program is already in a state that the developers didn't forsee, and hence couldn't possibly know how to rectify.

According to the definition of C++, the point of an enum type is that it's basically an integer where some of the values have names. The developer is responsible for foreseeing states where an enum value doesn't match any name and deciding how to handle it, because those are well-defined states.

With unscoped enums, it seems common and widely accepted to use an enum type to contain a set of flags, so you'll bitwise-or two enumerators and get an enum value that doesn't equal any named enumerator.

With scoped enums, storing a combination of flags is allowed but is very awkward (because you need static_casts everywhere) and I think any sensible style guide would advise against it. But even then, it seems quite reasonable to e.g. define a struct with a scoped enum field and read it from disk or from a network socket or decode it from JSON/protobuf/etc, and it could have an arbitrary integer value that doesn't match any enumerator. Maybe you have some validation layer that rejects such messages as soon as possible, but the language doesn't give you any tools to help implement that (e.g. there's no reflection to let you find all the enumerator values) and standard static analysis tools won't help (because non-enumerator values don't violate type safety and aren't undefined behaviour), so there's a risk that non-enumerator values will leak into the rest of your program. To be safe, you should handle those values everywhere.

Cro: Maintain it With Zig

Posted Sep 12, 2021 21:51 UTC (Sun) by HelloWorld (guest, #56129) [Link] (2 responses)

Actually the C++20 standard says this in Chapter 7.6.1.9, paragraph 10:
A value of integral or enumeration type can be explicitly converted to a complete enumeration type. If the enumeration type has a fixed underlying type, the value is first converted to that type by integral conversion, if necessary, and then to the enumeration type. If the enumeration type does not have a fixed underlying type, the value is unchanged if the original value is within the range of the enumeration values (9.7.1), and otherwise, the behavior is undefined
So it seems to me that it's impossible to create a value of an enum type other than the enumerators without previously invoking undefined behaviour (unless the enumeration type has a fixed underlying type). But I wonder if I'm misreading the standard here, because modern compilers should be able to exploit this, and yet they don't seem to. Something like this...
enum class Foo {
        Bar
};

auto f(Foo f) -> int {
        switch(f) {
                case Foo::Bar: return 42;
                default: return 23;
        }
}
... should just be compiled to mov eax, 42; ret according to my reading of the standard, but that's not what I get:
        test    edi, edi
        mov     ecx, 42
        mov     eax, 23
        cmove   eax, ecx
        ret
So I'm probably missing something here.

And you're right about the compiler warnings, that's a problem. But clang doesn't issue that diagnostic in such cases, and I think that's a good thing.

With unscoped enums, it seems common and widely accepted to use an enum type to contain a set of flags, so you'll bitwise-or two enumerators and get an enum value that doesn't equal any named enumerator.
If you want a set of bits, I think std::bitset is the way to go.
With scoped enums, storing a combination of flags is allowed but is very awkward (because you need static_casts everywhere) and I think any sensible style guide would advise against it. But even then, it seems quite reasonable to e.g. define a struct with a scoped enum field and read it from disk or from a network socket or decode it from JSON/protobuf/etc, and it could have an arbitrary integer value that doesn't match any enumerator. Maybe you have some validation layer that rejects such messages as soon as possible, but the language doesn't give you any tools to help implement that (e.g. there's no reflection to let you find all the enumerator values) and standard static analysis tools won't help (because non-enumerator values don't violate type safety and aren't undefined behaviour), so there's a risk that non-enumerator values will leak into the rest of your program. To be safe, you should handle those values everywhere.
Again, what are you going to do about it when you encounter a value other than the enumerators? That just means you had a bug in the part of your program that is supposed to validate the inputs, and now the program is in a state never expected or intended by the developer, so they couldn't possibly know what the correct way forward is.

Well, unless they actually do expect values other than the enumerators, in which case they should ask themselves why they're using an enum type in the first place. Anyway, the whole enum situation in C++ is a bit of a mess. I personally think that when you start thinking about the underlying representation of an enum type, you're probably operating at the wrong level of abstraction and should be using something other than an enum type, but that's not how the language is defined, apparently.

Cro: Maintain it With Zig

Posted Sep 12, 2021 22:19 UTC (Sun) by excors (subscriber, #95769) [Link] (1 responses)

> So it seems to me that it's impossible to create a value of an enum type other than the enumerators without previously invoking undefined behaviour (unless the enumeration type has a fixed underlying type).

"enum class Foo" is a scoped enumeration so it does have a fixed underlying type (defaulting to int), per C++20 9.7.1.5:

> Each enumeration defines a type that is different from all other types. Each enumeration also has an underlying type. The underlying type can be explicitly specified using an enum-base. For a scoped enumeration type, the underlying type is int if it is not explicitly specified. In both of these cases, the underlying type is said to be fixed.

The undefined behaviour only applies to an unscoped enum with no explicitly specified underlying type. And in that case "the range of the enumeration values" is not just the list of declared enumerators, it's a power-of-two-aligned range that includes the list of enumerators, per C++20 9.7.1.8:

> For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type. Otherwise, the values of the enumeration are the values representable by a hypothetical integer type with minimal width M such that all enumerators can be represented. [...] It is possible to define an enumeration that has values not defined by any of its enumerators

(C++17 has a much more verbose definition but I think it has the same effect.)

> Again, what are you going to do about it when you encounter a value other than the enumerators?

If that case can only be triggered by a bug in your code, you could do the equivalent of assert(0), i.e. crash the process and let some other system (init, kernel, hardware watchdog, etc) recover cleanly - same as any other case where you detect a bug. That's safer than e.g. falling off the bottom of a non-void function and returning some garbage (which could be a security vulnerability).

> Anyway, the whole enum situation in C++ is a bit of a mess.

I can't disagree with that :-)

Cro: Maintain it With Zig

Posted Sep 13, 2021 1:10 UTC (Mon) by HelloWorld (guest, #56129) [Link]

I see, thanks for pointing out the relevant sections of the standard. That does clear things up. I still think that enforcing a default clause is a bad idea because it prevents the compiler from issuing a warning when you miss an enumerator. Giving that up for an assert that is only ever going to do something if you've already messed up just doesn't seem like a good tradeoff. Especially given that crashing the process might not always be viable. There are situations where you need to keep going under all circumstances.

East const vs West const

Posted Sep 13, 2021 14:47 UTC (Mon) by dskoll (subscriber, #1630) [Link] (1 responses)

OMG! I had no idea these styles had names! I'm a strong proponent of East const, which makes way more sense to me that West const. Now I feel like I'm part of a community instead of a lone programmer tilting against windmills. :)

East const vs West const

Posted Sep 16, 2021 8:25 UTC (Thu) by ncm (guest, #165) [Link]

West is favored by people who wish they were coding C; east by people glad to be coding C++.

East "*" is generally worse. It is favored by compiler writers and people who glory in being a PITA, and practically nobody else.

But it is not hard to read any of them. Or, even, mixed on the same page; it just looks untidy. There are worse sins. Clang-format fixes everything without fuss.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds