User: Password:
|
|
Subscribe / Log in / New account

NOT a Language misfeature, it's a Programming Error

NOT a Language misfeature, it's a Programming Error

Posted Jun 28, 2011 10:30 UTC (Tue) by lacos (subscriber, #70616)
In reply to: Language misfeature by eru
Parent article: A hole in crypt_blowfish

Anybody who ever looked at the C89 standard, section "6.1.2.5 Types" (I'm even considering the age of the program in question here) knows that "char" may have value representation equivalent to that of "char signed".

paragraph 2:

----v----
An object declared as type _char_ is large enough to store any member of the basic execution character set. If a member of the required source character set enumerated in 5.2.1 is stored in a _char_ object, its value is guaranteed to be positive. If other quantities are stored in a char object, the behavior is implementation-defined; the values are treated as either signed or nonnegative integers.
----^----

The real fuckup here is that "ptr" was declared pointer-to-char, instead of pointer-to-char-unsigned. "char unsigned" is the type to access binary data or the object representation of objects.

"char" (which is a different type from "char signed", but may have identical value representation) does not even have to be able to represent more than 255 (NOT 256!) distinct values, if we rely on nothing else than the C89 standard. This accomodates sign-magnitude and one's complement representations. See C89 "5.2.4.2.1 Sizes of integral types <limits.h>":

----v----
[...] Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

- number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8

- minimum value for an object of type signed char
SCHAR_MIN -127

- maximum value for an object of type signed char
SCHAR_MAX +127

- maximum value for an object of type unsigned char
UCHAR_MAX 255

- minimum value for an object of type char
CHAR_MIN see below

- maximum value for an object of type char
CHAR_MAX see below

[...]

If the value of an object of type char is treated as a signed integer when used in an expression, the value of CHAR_MIN shall be the same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and the value of CHAR_MAX shall be the same as that of UCHAR_MAX.
----^----

"char *" is there so you can work with _TEXT_ that consists of elements of the (basic or extended) execution character set. "char unsigned *" is there for everything else "binary". If in doubt, use "char unsigned *".

Another example: reading something from a socket and using string functions (like strstr(), strcmp() etc) on the result is _broken_, from a portability aspect.


(Log in to post comments)

NOT a Language misfeature, it's a Programming Error

Posted Jun 29, 2011 6:34 UTC (Wed) by eru (subscriber, #2753) [Link]

I think you missed my point. I do agree it is a programming error. But this error (and countless like it) were made much easier to commit by this language misfeature, which makes straightforward code work in a very non-intuitive way.

unsigned char by the way is a fairly "new" invention. The original K&R C did not have it, just the char with implementation-dependent signedness. Therefore there probably still is legacy code around that has bugs like this because char was the only way to represent a byte, and the coders did not remember to put masking around all widening accesses.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds