|
|
Subscribe / Log in / New account

Would you like signs with those chars?

Would you like signs with those chars?

Posted Oct 24, 2022 22:45 UTC (Mon) by wahern (subscriber, #37304)
In reply to: Would you like signs with those chars? by NYKevin
Parent article: Would you like signs with those chars?

> That isn't the problem, as far as I can tell.

I can't find conclusive examples for is- ctype routines, but here is how tolower was defined during the first few releases of OpenBSD, as forked from NetBSD:

#define tolower(c) ((_tolower_tab_ + 1)[c])

It's still defined similarly on NetBSD, today: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/ctype_inl...

Also, EOF is a permitted value and typically -1 (thus the +1 in the above), though that would typically only be an issue for non-C locales.


to post comments

Would you like signs with those chars?

Posted Oct 24, 2022 23:41 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (2 responses)

> #define tolower(c) ((_tolower_tab_ + 1)[c])

Even so, I don't believe that the standard *actually* says that c has to be unsigned in that expression - just that the "usual arithmetic conversions" happen (i.e. the compiler magicks it into an int when you're not looking). Compilers presumably added that warning because there were instances of arrays being indexed with negative char, but not negative int or any other signed type. And, again, that presumably had something to do with ASCII supersets and other nonsense involving dirty 7 bit channels.

> Also, EOF is a permitted value and typically -1 (thus the +1 in the above), though that would typically only be an issue for non-C locales.

The argument is of type int (according to the standard, not that untyped macro), not char, so it's completely unambiguous: You are allowed to pass negative numbers to those routines, because int is always signed, and if it is implemented as a macro, it has to accept signed values in the int range. Of course, if you pass negatives other than EOF (or whatever EOF is #define'd to), then the standard presumably gives you UB (which is why it's OK for the array implementation to walk off the end in that case).

Would you like signs with those chars?

Posted Oct 25, 2022 0:22 UTC (Tue) by wahern (subscriber, #37304) [Link]

The C standard says, "The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined."

Would you like signs with those chars?

Posted Oct 25, 2022 15:13 UTC (Tue) by mrvn42 (guest, #161806) [Link]

>> #define tolower(c) ((_tolower_tab_ + 1)[c])
>> Also, EOF is a permitted value and typically -1 (thus the +1 in the above), though that would typically only be an issue for non-C locales.

> The argument is of type int (according to the standard, not that untyped macro), not char, so it's completely unambiguous: You are allowed to pass negative numbers to those routines, because int is always signed, and if it is implemented as a macro, it has to accept signed values in the int range. Of course, if you pass negatives other than EOF (or whatever EOF is #define'd to), then the standard presumably gives you UB (which is why it's OK for the array implementation to walk off the end in that case).

The problem is that this will only work for values between -1 and 127 for an array of 129 bytes. A value of -2 (or any other non-ascii value other than EOF with signed char) would access memory before the array and a value of 255 (EOF mistakenly stored in an unsigned char or anything non ascii) would access memory after the array.

Looking at the source link in the other comments the BSD code seems to assume chars are unsigned. The test for ascii doesn't work with signed chars at all.

So I assume that "_tolower_tab_" is actually 257 bytes long to cover all unsigned chars and EOF (which is -1 when stored as int).


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds