LWN.net Logo

Still no strlcpy and friends

Still no strlcpy and friends

Posted Mar 22, 2012 21:43 UTC (Thu) by slashdot (guest, #22014)
In reply to: Still no strlcpy and friends by HelloWorld
Parent article: GNU libc 2.15 released

std::string is best.

Concatenation:
a += b

Copy:
a = b


(Log in to post comments)

Still no strlcpy and friends

Posted Mar 22, 2012 22:11 UTC (Thu) by david.a.wheeler (guest, #72896) [Link]

Strange, "a += b" doesn't seem to work properly in my .c file.

I've recommended adding strlcpy, and I'm sad that glibc still fails to include them. While string truncation is a potential problem, unbounded allocation has its own problems (because it's, well, unbounded). The OpenBSD folks - who are big on security - specifically advocate strlcpy. I don't buy they argument that "strlcpy is a security problem" - it helps you deal with them.

Whenever you have buffers, you can either try to grow them unboundedly (which eventually fails) or cut them off at some point (strlcpy and friends). We need tools to let developers easily do either.

Still no strlcpy and friends

Posted Mar 23, 2012 3:32 UTC (Fri) by kevinm (guest, #69913) [Link]

It's not about doing things unbounded.

It's that if you write the code around your use of strlcpy() to detect and respond to truncation (instead of just ignoring it), you will find that you've written all the code you would need to just use memcpy() instead.

Still no strlcpy and friends

Posted Mar 23, 2012 6:54 UTC (Fri) by smurf (subscriber, #17840) [Link]

memcpy() doesn't find the terminating NUL for you.

If you want to carry the string length around, use C++ and std::string.
I have no problem with that. But let the rest of us C programmers use the functions that actually make sense.

Still no strlcpy and friends

Posted Mar 23, 2012 9:06 UTC (Fri) by khim (subscriber, #9252) [Link]

But let the rest of us C programmers use the functions that actually make sense.

Well, strlcpy does not make any sense in UTF-8 world thus your wish is obviously granted.

Still no strlcpy and friends

Posted Mar 23, 2012 12:11 UTC (Fri) by adobriyan (guest, #30858) [Link]

> Well, strlcpy does not make any sense in UTF-8 world thus your wish is obviously granted.

Reasonable programming languages have real character data type.
Reasonable programming languages also have one-dimensional arrays.
This route reasonable programming languages get strings (for free).
They also have automatic OOB access checks.

No part of this strncpy/strlcpy() idiocy makes sense in reasonable programming language universe. They won't even understand what the fuss is all about.

Still no strlcpy and friends

Posted Mar 23, 2012 13:22 UTC (Fri) by dgm (subscriber, #49227) [Link]

> No part of this strncpy/strlcpy() idiocy makes sense in reasonable programming language universe. They won't even understand what the fuss is all about.

That's fine. They are not the people that need to care. If an slow and safe string is all you need, why worry about all this? Just use whatever are given in your chosen language and move on. There's nothing for you to see here.

Still no strlcpy and friends

Posted Mar 23, 2012 14:07 UTC (Fri) by adobriyan (guest, #30858) [Link]

> If an slow and safe string is all you need

nice

> Just use whatever are given in your chosen language and move on.

The problem is that I'm C guy mostly.
But there is nothing given in C.
Exactly nothing.

The correct fix belongs into the programming language.
Looking at other PLs and several C "solutions" it should be obvious.

From this POV, Ulrich's decision is very smart.

Still no strlcpy and friends

Posted Mar 23, 2012 15:12 UTC (Fri) by tialaramex (subscriber, #21167) [Link]

“Reasonable programming languages have real character data type.”

A "real character data type" is almost never what you actually want because of how poorly defined characters are (or from another point of view, how many different and incompatible definitions there are for "character").

It's tempting to create a data type for Unicode code points. (Java specifications, some parts of the Win32 API, various databases, historically did this to their cost) because they are sometimes called "characters". But they can also represent things which aren't intuitively characters (such as the Byte Order Mark, or the LTR/RTL mode switches) and they can represent fractions of a character (like a macron) or symbols which are arguably groups of characters (like ligatures).

On the whole it's best to forget "characters" and handle only strings and sequences of bytes. This obliges the programmer to focus with due caution on any places that translate between the two. On the rare occasion that you do want to process Unicode code points they fit nicely into any modern integer type, such as 32-bit signed integers common in C.

This still leaves you with plenty of tricky problems (e.g. canonicalisation) with your Unicode strings if you need more work.

Still no strlcpy and friends

Posted Mar 23, 2012 20:23 UTC (Fri) by cmccabe (guest, #60281) [Link]

> > “Reasonable programming languages have real character data type.”

> A "real character data type" is almost never what you actually want
> because of how poorly defined characters are (or from another point of
> view, how many different and incompatible definitions there are for
> "character").

Please. We're trying to do "programming language advocacy" here.

Don't intrude on this with your "facts"or "logic."

Can I get an A-men?

Still no strlcpy and friends

Posted Mar 26, 2012 18:58 UTC (Mon) by bronson (subscriber, #4806) [Link]

Great post! It's unfortunate how many people still think they should be using C/C++'s fundamental char type.

The days of pointer arithmetic and character arrays have mostly drawn to a close. Even in C.

Still no strlcpy and friends

Posted Mar 27, 2012 15:09 UTC (Tue) by dgm (subscriber, #49227) [Link]

I think you have to read again the post you're answering to.

Still no strlcpy and friends

Posted Apr 2, 2012 20:57 UTC (Mon) by bronson (subscriber, #4806) [Link]

Care to say more? Code like "while(*c != '/') *b++ = *c++" doesn't work so well anymore.

Still no strlcpy and friends

Posted Mar 24, 2012 3:14 UTC (Sat) by cmccabe (guest, #60281) [Link]

> Well, strlcpy does not make any sense in UTF-8 world thus your
> wish is obviously granted.

Wow. You just put forward the ONLY reasonable argument in this thread for why strlcpy should not be included in glibc. Congratulations.

However, your argument is flawed, because it assumes that I will not check the return value of strlcpy and act appropriately. Functions have return values for a reason. If I am dealing with UTF8 data, I will not try to truncate the string on an arbitrary byte boundary. I will simply bail out when I'm told the string is too long to fit in my buffer.

The rest of the thread is just a bunch of "advocacy" (read: ignorance). Most it seems to center on people saying "just use std::string, it will fix what ails ye!"

This is good except that,
1. std::string offers the same level support for unicode as char* -- i.e., none.
2. using std::string to blindly copy user-supplied data opens you up to a different kind of security vulnerability, the denial of service.
3. std::string always allocates space on the heap, which makes it unsuitable for many uses
4. a lot of functions in the C++ standard library take char* arguments, so you have to learn how to use char* anyway.

Then there's the people who claim that still other programming languages will magically make the problems of string manipulation go away.

It simply isn't so.

For example, in Python or Java, control characters can be inserted into strings. This means that printing the strings to stdout later could cause undesirable effects to the user. If they are running in an xterm or in a GNU screen session, it could execute arbitrary commands. Will the higher level languages protect you from this? No.

Then there's the problem of normalization. There are four different ways to normalize unicode strings. That means that if your programming language wants to natively support the operation of comparing two strings, it has to have four different kinds of equals sign.

There's no Java worshippers in this thread (strange, we have every other kind of critter) but just to set the record straight, you cannot simply count the number of chars in a string in Java and assume that that is the length. Java uses UCS-2, so this will work only for the basic multilingual plane. There are reasons to use Java, but this isn't one of them.

tl;dr. There are lots of gotchas surrounding using strings in C, but some of them exist in every language. Unicode is complex and understood by few. Simple solutions to complicated problems are usually wrong.

Looks like you are correct…

Posted Mar 24, 2012 7:56 UTC (Sat) by khim (subscriber, #9252) [Link]

Actually arguments against strlcpy looks sensible, but most of them are centered against the ability to not check the return result (it looks like it's impossible to correctly without checking the return result).

Why not declare it like this:

#ifdef I_REALLY_NEED_STRLCPY
#pragma GCC diagnostic error "-Wunused-result"
size_t strlcpy(char *dst, const char *src, size_t size) __attribute__((warn_unused_result));
#endif

GLibC is tightly tied to GCC anyway and it looks like this approach should be fine WRT safety, no?

P.S. In reality I'm not sure I like strlcpy all that much: it's interface is too complicated. You really can not do anything sensible with strlcpy return value except compare it with “size” and for that strcpy_s looks like a saner interface (especially in C++).

Looks like you are correct… or not

Posted Mar 24, 2012 12:38 UTC (Sat) by smurf (subscriber, #17840) [Link]

strcpy_s is useless. Its argument order is *of course* different, and the requirement to not change the destination when the source doesn't fit ignores the only reason why strlcpy() even exists -- as opposed to a macro that calls strlen and memcpy.

Of course you can do something sensible with strlcpy's return value -- you can use it as offset to the end of the string, instead of calling strlcat(), when you want to append something.

NB: *its interface. :-P

Looks like you are correct… or not

Posted Mar 24, 2012 13:19 UTC (Sat) by khim (subscriber, #9252) [Link]

Its argument order is *of course* different, and the requirement to not change the destination when the source doesn't fit ignores the only reason why strlcpy() even exists -- as opposed to a macro that calls strlen and memcpy.

You want to imply that strlcpy exist only to introduce subtle security holes? Then the arguments to not include it are quite obviously valid…

Of course you can do something sensible with strlcpy's return value -- you can use it as offset to the end of the string, instead of calling strlcat(), when you want to append something.

Right. But the point is that you should not do that. Low-level functions like memcpy, strlcpy, or strcpy_s only make sense when you deal with buffers of fixed size. If you need/want to deal with reallocation and other similar tricks then the whole thing becomes so fragile that it must be put either in separate set of functions or in language core.

NB: *its interface. :-P

Yes. A horrible one. Good API must not only be easy to correctly use, it must be hard to misuse. strlcpy does so-so on the first requirement and completely blows up the second while strcpy_s is fine on both fronts.

Still no strlcpy and friends

Posted Mar 23, 2012 9:29 UTC (Fri) by HelloWorld (guest, #56129) [Link]

You didn't get the point. The point is that if you actually want to make sure that strlcpy won't silently truncate your string, you need to find the terminating NUL anyway, and at that point you might as well use memcpy.

Still no strlcpy and friends

Posted Mar 23, 2012 12:58 UTC (Fri) by smurf (subscriber, #17840) [Link]

Umm, what? strlcpy finds that NUL and copies my string, all in one pass.
memcpy only copies, so I'd need a second pass.

The whole point of strlcpy is that it does a strlen+memcpy at the same time. That's where its name comes from.

Please post sample code that's any shorter, faster OR safer than

if (strlcpy(dest,src,LEN) >= LEN) { return_with_error }

before making such assertions. Thanks.

C as a language does not have exceptions. You can complain all you want that calling strlcpy() and friends without checking its result leaves your data in an inconsistent state, but that's a problem of the language. Any other function that copies a string into a buffer will have the exact same problem.

Still no strlcpy and friends

Posted Mar 23, 2012 13:37 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

If you ever have to handle strings in encodings that have shift states or multi-byte characters, not even strl{cat,cpy} are safe enough for production use.

Still no strlcpy and friends

Posted Mar 23, 2012 13:56 UTC (Fri) by adobriyan (guest, #30858) [Link]

It depends on what "response to truncation" is.
If it's usual i-do-not-care exit(EXIT_FAILURE), strlcpy() is OK.
If it's not and, say, dst reallocation is required using strlcpy is double work if truncation happens.

"incosistent state" is not a problem of the language. strlcpy could easily restore original \0.

Still no strlcpy and friends

Posted Mar 23, 2012 15:21 UTC (Fri) by HelloWorld (guest, #56129) [Link]

> if (strlcpy(dest,src,LEN) >= LEN) { return_with_error }
Why would you write code like that? If you want to avoid truncating strings, you need to dynamically allocate a buffer to copy the string into, and to do so you need its size.

By the way, this problem doesn't even exist when using a sensible string representation which stores the string's length separately, such as GString. Functions like strlcpy are just a (bad) workaround for C's string representation.

Still no strlcpy and friends

Posted Mar 23, 2012 19:24 UTC (Fri) by smurf (subscriber, #17840) [Link]

I might have to fix a more-or-less-buggy legacy program which uses fixed-size buffers. My task, in this case, is not to rewrite the damn thing with GString, but to replace all the unsafe string crapola with sane code -- code that doesn't incur a performance penalty if at all possible.

str*cpy's use case is to copy a string A, whose length I don't know, to a buffer B, whose length I do know. Given this, my argument is that there's nothing better than strlcpy to do that safely. Given that this fairly common use case is not going to go away any time soon, IMHO not including it in glibc is stupid.

There are lots of situations with different use cases. I don't contest that. My point is that I don't always have a choice. In fact, whenever I _do_ have a choice, I use Python. :-P

Still no strlcpy and friends

Posted Mar 23, 2012 19:32 UTC (Fri) by HelloWorld (guest, #56129) [Link]

> str*cpy's use case is to copy a string A, whose length I don't know, to a buffer B, whose length I do know. Given this, my argument is that there's nothing better than strlcpy to do that safely. Given that this fairly common use case is not going to go away any time soon, IMHO not including it in glibc is stupid.
I completely disagree. New functions should be added to a library because they make sense for today's code, not because they make poor fixes to broken legacy programs easier. After all, nothing stops you from using strlcpy: write it yourself, copy it from somewhere, use libbsd, whatever.

Still no strlcpy and friends

Posted Mar 23, 2012 19:38 UTC (Fri) by HelloWorld (guest, #56129) [Link]

I just read tialaramex' comment here:
http://lwn.net/Comments/488249/
and he put it better than I could:
> New standard library functions should, on the whole, reflect good existing practice. It's not clear that strlcpy-like functions in existing code are good practice, they're often just laziness. The C library already has more than enough of that.

Still no strlcpy and friends

Posted Mar 23, 2012 9:36 UTC (Fri) by dgm (subscriber, #49227) [Link]

Actually this is a very nice angle.

Usually the people that compare C's string and memory functions just talk about their interfaces, as if they appeared in vacuum. Comparing their actual usage patterns (both the usual and correct cases) would be so much more interesting.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds