|
|
Subscribe / Log in / New account

Would you like signs with those chars?

Would you like signs with those chars?

Posted Oct 24, 2022 21:14 UTC (Mon) by NYKevin (subscriber, #129325)
In reply to: Would you like signs with those chars? by cesarb
Parent article: Would you like signs with those chars?

More to the point, it better matches reality. A byte is 8 bits, not a number. You can use it to store numbers, but in practice it will be a component of a short or int (or a UTF-8 code point sequence, which sometimes might happen to only be one byte long), not a number in its own right. I don't visualize bytes as getting magically sign-extended when I do bitwise operations on them, and I pretty much never use char for math or counting things. I suppose there might be some situations where you really are extremely sure that you'll never need to count higher than 127 or lower than -128, but it's difficult to imagine a specific example (that does not involve the phrase "for historical reasons").


to post comments

Would you like signs with those chars?

Posted Oct 24, 2022 22:13 UTC (Mon) by Sesse (subscriber, #53779) [Link] (4 responses)

It's fairly common to use these types to save memory. Just to take a random example from work: A counting Bloom filter will almost never need to count higher than 255, so why waste four times the memory (and cache space)?

I do wish C had made a separate “byte” type, though, for aliasing reasons. char has too many tasks.

Would you like signs with those chars?

Posted Oct 24, 2022 23:26 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

1. short is very often good enough for use cases like that. Not always, but often.
2. If you must use char, unsigned will work perfectly well (does your Bloom Filter have a negative count?!), so this isn't actually a use case for signed char.
3. If you need a sentinel value, you can use 255; there is no magical rule that says sentinel values have to be negative.

Would you like signs with those chars?

Posted Oct 25, 2022 7:30 UTC (Tue) by gspr (guest, #91542) [Link] (2 responses)

Re the aliasing of char: wouldn't uint8_t be a nicely named type for your use?

Would you like signs with those chars?

Posted Oct 25, 2022 7:34 UTC (Tue) by Sesse (subscriber, #53779) [Link] (1 responses)

It would, except it maps onto unsigned char, which can alias on anything as it stands. And it's about fifty years too late to change that :-)

Would you like signs with those chars?

Posted Oct 25, 2022 19:18 UTC (Tue) by wahern (subscriber, #37304) [Link]

uint8_t isn't required to be typedef'd to unsigned char. An implementation could choose to treat it differently from unsigned char, precisely to avoid the aliasing behavior of char. It seems that in the case of GCC, it may have been C++ partly to blame for the current state of affairs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110#c13

Would you like signs with those chars?

Posted Oct 25, 2022 5:09 UTC (Tue) by SLi (subscriber, #53131) [Link] (4 responses)

I really wonder (but am too lazy to Google now) what even is the history of signed chars. It seemed like a weird thing to me anyway. Was it some kind of wish to have a type that represents "all the characters we care about" and -1 for EOF, in times before the relevant routines were int?

Would you like signs with those chars?

Posted Oct 25, 2022 16:15 UTC (Tue) by khim (subscriber, #9252) [Link]

I suspect it was just added when the C Standard committee realized that on some platforms simple char is unsigned. They needed single-byte signed type thus signed char was born.

Would you like signs with those chars?

Posted Oct 27, 2022 2:09 UTC (Thu) by gdt (subscriber, #6284) [Link] (2 responses)

The problem being solved is moving a byte in memory to a word-sized register. If you want to do that load in one processor instruction then you have to accept the processor's choice of sign extension or otherwise as that byte is expanded into the register.

If the language insists on "char" being "unsigned char" then some processors will need to follow the register load with an AND instruction to clear the sign extension. If loading that register also sets register flags (eg, Negative) then you'll need to clear those register flags too. You could, of course, perhaps avoid this with careful compiler optimisations, but that's asking too much of the compilers of the era.

Well before the ANSI standards committee started work, the convention in C was to let these differences in processor implementations shine through, with an obligation on people writing code intended to be 'portable' between differing processors to deal with the results. Considering that C was a systems programming language, this wasn't an unreasonable choice.

Would you like signs with those chars?

Posted Oct 27, 2022 6:01 UTC (Thu) by SLi (subscriber, #53131) [Link]

Ah, right, makes total sense. Thank you!

Would you like signs with those chars?

Posted Oct 27, 2022 7:18 UTC (Thu) by joib (subscriber, #8541) [Link]

https://trofi.github.io/posts/203-signed-char-or-unsigned... has some investigation on this issue. Turns out there are quite a few architectures that only provide zero extending byte loads but the ABI has chosen char's to be signed, thus requiring an extra instruction to patch it up.

(That blog post is a few years old and doesn't include results for RISC-V, but I understand that RISC-V is like ARM, in that it provides both zero extending and sign extending byte loads but the ABI has chosen char's to be unsigned)


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds