Also, a buffer overrun in any other language will generally imply the application is buggy and will yield at the very least corrupt results, and similarly for NULL pointer dereferencing and other off-the rails events. C might just blow up a bit harder, and unfortunately make a bigger hole as a result. There are plenty of remedies outside the C language; there's different defense vectors just as there are different attack vectors. The worst and weakest C wart IMHO are \0-terminated strings. As a result we get duplicate mem and str libraries, already a good indication something is amiss. Reasoning about \0-terminated strings is hard, not because of buffer sizes, but because you always have to make sure that no stowaway \0 can possibly be present inside your string.
What every C Programmer should know about undefined behavior #2/3
Posted May 17, 2011 15:07 UTC (Tue) by dgm (subscriber, #49227)
[Link]
> C might just blow up a bit harder, and unfortunately make a bigger hole as a result.
I don't see why. What's the magic that makes C "blow harder" than any other language where a NULL pointer dereference is possible (assembler or Pascal, for example)?
> As a result we get duplicate mem and str libraries, already a good indication something is amiss.
Nonsense. Pascal, for instance, does not use zero-terminated string (uses size-prefixed strings), neither do other languages like C++, C# or Java. And all of them have separate facilities for dealing with strings and arbitrary buffers.
> Reasoning about \0-terminated strings is hard, not because of buffer sizes, but because you always have to make sure that no stowaway \0 can possibly be present inside your string.
"Hard" is relative. I have foggy memories of having had some problem with an embedded zero in an string on my first week of writing C, like 20 years ago, but never after that.
What every C Programmer should know about undefined behavior #2/3
Posted May 17, 2011 19:06 UTC (Tue) by stijn (subscriber, #570)
[Link]
Given that I am stuck where you were 20 years ago, you are bound to be right - thanks, that clarified matters greatly!
What every C Programmer should know about undefined behavior #2/3
Posted May 17, 2011 21:57 UTC (Tue) by baldridgeec (guest, #55283)
[Link]
For your last point, basically it comes down to "if there might be a \0 in valid input for this function, even though it may be a byte[], it is not a string."
So you use memchr() instead of strchr().
Or you use C++, and call it a string instead of a byte[]. (and use the actual C++ string-manipulation functions, not strchr - g++ warns about (byte[])string as being an invalid cast nowadays anyway though) :)
What every C Programmer should know about undefined behavior #2/3
Posted May 19, 2011 14:49 UTC (Thu) by dgm (subscriber, #49227)
[Link]
Your problem seems to be that you want to use the C string library to handle arrays of arbitrary bytes. I will not work, it's not what it was designed to do.
What every C Programmer should know about undefined behavior #2/3
Posted May 19, 2011 15:49 UTC (Thu) by stijn (subscriber, #570)
[Link]
Just to clarify, I can deal with both the string library and the mem functions. When I come to LWN, I hope to engage in constructive conversation, not chest-beating.