|
|
Log in / Subscribe / Register

Cro: Maintain it With Zig

Cro: Maintain it With Zig

Posted Sep 12, 2021 14:45 UTC (Sun) by HelloWorld (guest, #56129)
In reply to: Cro: Maintain it With Zig by excors
Parent article: Cro: Maintain it With Zig

> I think that's incorrect, because it's legal to cast an integer to an "enum class" type even if it's not one of the declared enumerators.
The problem here is the cast, not the lack of a default clause. Why doesn't MISRA forbid that? That would actually make sense...

Besides, what useful thing could you possibly do in such a default clause? Because after all, the whole point of an enum type is that it can only hold one of a number of enumerated values. Therefore, when you encounter a value that isn't among them, your program is already in a state that the developers didn't forsee, and hence couldn't possibly know how to rectify.


to post comments

Cro: Maintain it With Zig

Posted Sep 12, 2021 17:06 UTC (Sun) by excors (subscriber, #95769) [Link] (3 responses)

> The problem here is the cast, not the lack of a default clause. Why doesn't MISRA forbid that? That would actually make sense...

Hmm, it looks like MISRA C++:2008 already forbids that: "Rule 7-2-1: An expression with enum underlying type shall only have values corresponding to the enumerators of the enumeration". (That's based on C++03 and scoped enumerations are a C++11 feature, so it's talking about unscoped enumerations here.)

In that case, I think it would be feasible to have exhaustive switches over enums. But since it's different to the standard C++ rules, you'd need to suppress the compiler's "control reaches end of non-void function" warnings (and suppressing warnings seems generally dodgy when you care about safety), then add a static analysis tool to check the new rules. I don't know much about MISRA but I guess they didn't want to rely on tools that didn't exist yet.

> Because after all, the whole point of an enum type is that it can only hold one of a number of enumerated values. Therefore, when you encounter a value that isn't among them, your program is already in a state that the developers didn't forsee, and hence couldn't possibly know how to rectify.

According to the definition of C++, the point of an enum type is that it's basically an integer where some of the values have names. The developer is responsible for foreseeing states where an enum value doesn't match any name and deciding how to handle it, because those are well-defined states.

With unscoped enums, it seems common and widely accepted to use an enum type to contain a set of flags, so you'll bitwise-or two enumerators and get an enum value that doesn't equal any named enumerator.

With scoped enums, storing a combination of flags is allowed but is very awkward (because you need static_casts everywhere) and I think any sensible style guide would advise against it. But even then, it seems quite reasonable to e.g. define a struct with a scoped enum field and read it from disk or from a network socket or decode it from JSON/protobuf/etc, and it could have an arbitrary integer value that doesn't match any enumerator. Maybe you have some validation layer that rejects such messages as soon as possible, but the language doesn't give you any tools to help implement that (e.g. there's no reflection to let you find all the enumerator values) and standard static analysis tools won't help (because non-enumerator values don't violate type safety and aren't undefined behaviour), so there's a risk that non-enumerator values will leak into the rest of your program. To be safe, you should handle those values everywhere.

Cro: Maintain it With Zig

Posted Sep 12, 2021 21:51 UTC (Sun) by HelloWorld (guest, #56129) [Link] (2 responses)

Actually the C++20 standard says this in Chapter 7.6.1.9, paragraph 10:
A value of integral or enumeration type can be explicitly converted to a complete enumeration type. If the enumeration type has a fixed underlying type, the value is first converted to that type by integral conversion, if necessary, and then to the enumeration type. If the enumeration type does not have a fixed underlying type, the value is unchanged if the original value is within the range of the enumeration values (9.7.1), and otherwise, the behavior is undefined
So it seems to me that it's impossible to create a value of an enum type other than the enumerators without previously invoking undefined behaviour (unless the enumeration type has a fixed underlying type). But I wonder if I'm misreading the standard here, because modern compilers should be able to exploit this, and yet they don't seem to. Something like this...
enum class Foo {
        Bar
};

auto f(Foo f) -> int {
        switch(f) {
                case Foo::Bar: return 42;
                default: return 23;
        }
}
... should just be compiled to mov eax, 42; ret according to my reading of the standard, but that's not what I get:
        test    edi, edi
        mov     ecx, 42
        mov     eax, 23
        cmove   eax, ecx
        ret
So I'm probably missing something here.

And you're right about the compiler warnings, that's a problem. But clang doesn't issue that diagnostic in such cases, and I think that's a good thing.

With unscoped enums, it seems common and widely accepted to use an enum type to contain a set of flags, so you'll bitwise-or two enumerators and get an enum value that doesn't equal any named enumerator.
If you want a set of bits, I think std::bitset is the way to go.
With scoped enums, storing a combination of flags is allowed but is very awkward (because you need static_casts everywhere) and I think any sensible style guide would advise against it. But even then, it seems quite reasonable to e.g. define a struct with a scoped enum field and read it from disk or from a network socket or decode it from JSON/protobuf/etc, and it could have an arbitrary integer value that doesn't match any enumerator. Maybe you have some validation layer that rejects such messages as soon as possible, but the language doesn't give you any tools to help implement that (e.g. there's no reflection to let you find all the enumerator values) and standard static analysis tools won't help (because non-enumerator values don't violate type safety and aren't undefined behaviour), so there's a risk that non-enumerator values will leak into the rest of your program. To be safe, you should handle those values everywhere.
Again, what are you going to do about it when you encounter a value other than the enumerators? That just means you had a bug in the part of your program that is supposed to validate the inputs, and now the program is in a state never expected or intended by the developer, so they couldn't possibly know what the correct way forward is.

Well, unless they actually do expect values other than the enumerators, in which case they should ask themselves why they're using an enum type in the first place. Anyway, the whole enum situation in C++ is a bit of a mess. I personally think that when you start thinking about the underlying representation of an enum type, you're probably operating at the wrong level of abstraction and should be using something other than an enum type, but that's not how the language is defined, apparently.

Cro: Maintain it With Zig

Posted Sep 12, 2021 22:19 UTC (Sun) by excors (subscriber, #95769) [Link] (1 responses)

> So it seems to me that it's impossible to create a value of an enum type other than the enumerators without previously invoking undefined behaviour (unless the enumeration type has a fixed underlying type).

"enum class Foo" is a scoped enumeration so it does have a fixed underlying type (defaulting to int), per C++20 9.7.1.5:

> Each enumeration defines a type that is different from all other types. Each enumeration also has an underlying type. The underlying type can be explicitly specified using an enum-base. For a scoped enumeration type, the underlying type is int if it is not explicitly specified. In both of these cases, the underlying type is said to be fixed.

The undefined behaviour only applies to an unscoped enum with no explicitly specified underlying type. And in that case "the range of the enumeration values" is not just the list of declared enumerators, it's a power-of-two-aligned range that includes the list of enumerators, per C++20 9.7.1.8:

> For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type. Otherwise, the values of the enumeration are the values representable by a hypothetical integer type with minimal width M such that all enumerators can be represented. [...] It is possible to define an enumeration that has values not defined by any of its enumerators

(C++17 has a much more verbose definition but I think it has the same effect.)

> Again, what are you going to do about it when you encounter a value other than the enumerators?

If that case can only be triggered by a bug in your code, you could do the equivalent of assert(0), i.e. crash the process and let some other system (init, kernel, hardware watchdog, etc) recover cleanly - same as any other case where you detect a bug. That's safer than e.g. falling off the bottom of a non-void function and returning some garbage (which could be a security vulnerability).

> Anyway, the whole enum situation in C++ is a bit of a mess.

I can't disagree with that :-)

Cro: Maintain it With Zig

Posted Sep 13, 2021 1:10 UTC (Mon) by HelloWorld (guest, #56129) [Link]

I see, thanks for pointing out the relevant sections of the standard. That does clear things up. I still think that enforcing a default clause is a bad idea because it prevents the compiler from issuing a warning when you miss an enumerator. Giving that up for an assert that is only ever going to do something if you've already messed up just doesn't seem like a good tradeoff. Especially given that crashing the process might not always be viable. There are situations where you need to keep going under all circumstances.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds