|
|
Subscribe / Log in / New account

strlcpy()

Years of buffer overflow problems have made it clear that the classic C string functions - strcpy() and friends - are unsafe. Functions like strncpy(), which take a length argument, have been presented as the safe alternatives. But strncpy() has always been poorly suited to the task; it wastes time by zero-filling the destination string, and, if the string to be copied must be truncated, the result is no longer NULL-terminated. A non-terminated string can lead to overflows and bugs in its own right. So Linus finally got fed up and put together a new copy_string() function which does what most strncpy() users really wanted in the first place.

As is often the case with this sort of security-related improvement, OpenBSD got there first. In fact, back in 1996, the OpenBSD team came up with a new string API which avoids the problems of both strcpy() and strncpy(). The resulting functions, with names like strlcpy(), have been spreading beyond OpenBSD. The basic function is simple:

    size_t strlcpy(char *dest, const char *src, size_t size);

The source string is copied to the destination and properly terminated; the return value is the length of the source. If that length is greater than the destination string, the caller knows that the string has been truncated.

Linus agreed that following OpenBSD's lead was the right way forward, and strlcpy() is in his BitKeeper repository, waiting for 2.5.71. There has also been a flurry of activity to convert kernel code over to the new function. By the time 2.6.0 comes out, strncpy() may no longer have a place in the Linux kernel.


to post comments

strlcpy()

Posted May 29, 2003 16:26 UTC (Thu) by eh (guest, #266) [Link] (6 responses)

Thanks for noting this function; note also that *BSD has a strlcat().

Maybe the glibc folks should consider adding these two.

BTW, OpenBSD might have been first, but Free&NetBSD have them now too.

Once again LWN has called my attention to something I'm glad to know.

strlcpy()

Posted May 29, 2003 17:05 UTC (Thu) by cpeterso (guest, #305) [Link] (5 responses)

I agree that glibc should add these functions. *BSD has had them for years. I think even the Solaris libc has them.

I think glibc should go a step further and actually remove the dangerous functions like strcpy(). There are safe replacements. glibc should use a carrot and stick approach. Offer the safe replacements and remove (or just deprecate) the dangerous functions.

strlcpy()

Posted May 29, 2003 17:40 UTC (Thu) by Ross (guest, #4065) [Link]

There are many perfectly valid uses of strcpy(). In fact most uses
of strcpy() are probably correct. Removing that function would lead
to huge problems with backwards compatibility, cross-platform portability,
and standards compliance. It would be a very, very bad idea.

Feel free to use strlcpy() exclusively yourself, but don't tell everyone
else what they can or can't use. The only function I have ever agreed
with removing is gets() because there is a nearly 100% chance of a bug
every time it is used.

strlcpy()

Posted May 29, 2003 19:54 UTC (Thu) by dwheeler (guest, #1216) [Link] (3 responses)

Yes, strlcpy() is in the *BSDs, it's also in Sun Solaris. It's also in the library Glib (NOT glibc), the basic library for GTK+ and GNOME.

However, it is NOT in glibc, because Ulrich Drepper doesn't want it in there. You can see his rationale for this in the glibc mailing list.

I don't agree with his decision. One of the biggest security problems is STILL buffer overflows, and strlcpy()/strlcat() can really help reduce their incidence. And since it's not in glibc, everyone has to "roll their own" implementation (which may not be as efficient as it would be if it were in the standard library). If a future C standard included strlcpy() in the standard C library, then I believe glibc would add it too. Thus, if you want strlcpy() available everywhere, it might be best to appear to the ISO C group to add strlcpy() and strlcat() to the C standard library.

strlcpy()

Posted May 29, 2003 23:14 UTC (Thu) by eh (guest, #266) [Link] (2 responses)

I grabbed libc-hacker-*.bz2 from
ftp://sources.redhat.com/pub/glibc/mail-archives/
and grepped neither strlcpy or strlcat.

What was his rationale?

> If a future C standard included strlcpy() in the standard C library,
> then I believe glibc would add it too.

I certainly hope you're right about that.

> [...] it might be best to appear to the ISO C group to add strlcpy()
> and strlcat() to the C standard library.

Yes, provided s/\(appea\)r/\1l/ ;)

strlcpy()

Posted May 30, 2003 4:52 UTC (Fri) by rloomans (guest, #759) [Link] (1 responses)

> What was his rationale?

The thread starts here. Christoph Hellwig posted a patch to implement strlcat() and strlcpy(). Ulrich Drepper replies scathingly...

strlcpy()

Posted May 30, 2003 14:59 UTC (Fri) by eh (guest, #266) [Link]

Well, thanks for the link, but I almost wish I hadn't asked.
Now sickened early in the morning, waiting for beer o'clock.

> Ulrich Drepper replies scathingly...

... and inappropriately, even idiotically.

sentence 1: claims inefficiency, contradicting Usenix paper cited by
Hellwig, does not support claim.
sentence 2: claims strl*() lead to ``other errors'', again unsupported.
sentence 3: On the soapbox with disregard for real world and the problem
strl*() are meant to address.

Next, in reply to Hellwig's reply to his first reply he reveals the One True:

*((char *) mempcpy (dst, src, n)) = '\0';

He doesn't say whether n is sizeof(*dst)-1, or strlen(src), but either way
this must be preceded by setup and error-checking code. The former
needs n >= strlen(src) or the copy is potentially truncated. So either way
previous error-prone code is necessary, a strlen() is necessary (he argued
efficiency), and you wind up with the elements necessary for plain old
strcpy() (stpcpy() if you're saving the return). So what does mempcpy()
do that's better than strlcpy()?

Next muddled paragraph suggests strl*() are buggy because ``If a string is
too long for an allocated memory block the copying must not simply
silently stop.'' He seems to have missed that his mempcpy() thingy will
``simply silently stop'' and requires advance knowledge of strlen(src)
whereas the return from strlcpy() eases truncation detection which
subsequent code can then handle.

I don't know why I bothered writing this. I'm annoyed now. There's a reason
I usually just lurk.

I don't read the list and don't know Ulrich Drepper's character. I only hope
he just had a bad day. That was 08/2000, maybe someone should bring
the subject up again. (Now someone will say it's been brought up again
and he maintains the same arguments, right?)

(To counteract the tone of this posting I want to say I really admire glibc and
its developers.)


strlcpy()

Posted May 29, 2003 20:27 UTC (Thu) by nas (subscriber, #17) [Link] (12 responses)

I wrote a public domain version of strlcpy since the BSD version is licensed with the annoying advertising clause.

better code

Posted May 29, 2003 22:20 UTC (Thu) by ncm (guest, #165) [Link] (7 responses)

I have posted a better implementation, also public domain.

Anyway I think it's better. You decide.

better code

Posted May 30, 2003 4:15 UTC (Fri) by tjc (guest, #137) [Link]

I timed both your implementation and Linus' through a 4 billion interation loop, and Linus has you by about 10 to 15 percent. But then he's using memcpy(), so there's an issue with overlapping source and destination strings..

better code

Posted May 30, 2003 19:36 UTC (Fri) by tjc (guest, #137) [Link] (5 responses)

OK, here's my implementation:

size_t strlcpy(char *dest, const char *src, size_t n)
{
        int len, i;

        if (!n)
                return 0;

        len = strlen(src);
        if (len >= n)
                len = n - 1;

        /* check for overlapping source and destination */
        if ((src < dest && src + len >= dest)
                || (dest < src && dest + n > src)
                || src == dest)
        {
                size_t i;

                for (i = 0; src[i] && i < n - 1; i++)
                        dest[i] = src[i];

                dest[i] = (char) 0;
        }
        else
        {
                memcpy(dest, src, len);
                dest[len] = (char) 0;
        }

        return len;
}

I haven't tested this extensively, but I've found that the best way to debug code is to post it on the internet. ;-) People come out of the woodwork...

better code

Posted May 31, 2003 1:39 UTC (Sat) by dododge (guest, #2870) [Link] (3 responses)


/* check for overlapping source and destination */
if ((src < dest && src + len >= dest)

Pointer comparison is undefined if the pointers are not within the same object, so this is not portable standard C.

memmove is safe for overlapping regions, and since it's part of the standard library it's allowed to use architecture-specific magic to compare arbitrary pointers. It can also make use of optimized machine code, so it's potentially more efficient than any implementation written in standard C.

but I've found that the best way to debug code is to post it on the internet.

I haven't looked closely at the rest of the function. If you want it thoroughly picked apart you could try posting it to USENET comp.lang.c [insert evil laugh] :-)

better code

Posted May 31, 2003 6:04 UTC (Sat) by tjc (guest, #137) [Link] (2 responses)

Pointer comparison is undefined if the pointers are not within the same object, so this is not portable standard C.

I've heard of this, but I have never read a good explanation. I just assumed that this restiction has something to do with pointer aliasing. If you understand this, now is the time to show off! ;-)

memmove is safe for overlapping regions, and since it's part of the standard library it's allowed to use architecture-specific magic to compare arbitrary pointers.

How is the performance of memmove()? Does it copy memory a double word at a time? I couldn't find a general way to do this without using architecture-specific magic as you say, so I fell back to memcpy().

BTW, s/int len, i/size_t len/ above. I noticed this about 100ns after I posted. Always use -Wall...

better code

Posted Jun 2, 2003 6:13 UTC (Mon) by eru (subscriber, #2753) [Link]

>> Pointer comparison is undefined if the pointers are not within the same
>> object, so this is not portable standard C.
>
> I've heard of this, but I have never read a good explanation. I just
> assumed that this restiction has something to do with pointer aliasing. If
> you understand this, now is the time to show off! ;-)

I always assumed the restriction in the C standard exists mainly because
in segmented memory management, the numeric values of pointers do not
necessarily correspond to their relative arrangement in memory.
Comparison is meaningful only for pointers that have the same segment
part. Since Linux does not use segmentation, at least not in an
user-visible way, there is no need to worry about this. (Few operating
systems use segments these days, but I happen to work with one that does,
even though it runs on the 32-bit versions of x86. Yes, 48-bit pointers!)

better code

Posted Jun 4, 2003 3:39 UTC (Wed) by dododge (guest, #2870) [Link]

Pointer comparison is undefined if the pointers are not within the same object,
I've heard of this, but I have never read a good explanation. I just assumed that this restiction has something to do with pointer aliasing.

Well, it's undefined because the standard explicitly says so :-). As to why the standard says so, there is presumably some architecture out there that C works on which cannot reliably support comparing arbitrary pointers; or allowing this comparison might make it too difficult to implement C on certain architectures. The most obvious reason for allowing this would be to implement memmove, which the standard already provides.

Portability discussions come up in comp.lang.c fairly often, and someone occasionally chimes in with an example of a real architecture they deal with where common-sense assumptions about computer architecture don't hold true. There's a lot of weird designs out there, and when you start talking about embedded devices they may even have a larger installed base than anything x86-derived. The worst case is the "DeathStation 9000", a hypothetical machine where even the most subtle undefined behavior produces catastrophic results.

How is the performance of memmove()? Does it copy memory a double word at a time?

Depends on your C library, compiler, operating system, chip architecture, etc. You'll have to examine your libc source to find out how it's done for your system. And you'll also have to check your compiler output to make sure it actually calls the libc implementation. For example gcc 2.95.3 on sparc-sun-solaris produces inline assembly for small memcpy operations rather than actually calling into libc.

Always use -Wall...

I'm rather fond of -ansi -pedantic -Wall -W myself :-)

better code

Posted May 31, 2003 20:36 UTC (Sat) by fjord (guest, #6510) [Link]

Hmm, I haven't actually read the original documentation for the strl* functions, but according to this:

http://sources.redhat.com/ml/libc-alpha/2000-08/msg00110.html

strlcpy should return the size of the source string and do nothing, if the buffer is too small.

Off by one?

Posted May 29, 2003 22:21 UTC (Thu) by raph (guest, #326) [Link] (2 responses)

I'm pretty sure you have an off-by-one error there - your code can write up to (size+1) bytes of dst, while from the paper it looks like the correct semantics are to write up to size bytes only - so that strlcat(buf, src, sizeof(buf)) is safe.

Off by one?

Posted May 30, 2003 4:40 UTC (Fri) by ncm (guest, #165) [Link] (1 responses)

I agree, Neil's is buggy. Walk through it with size == 1. It clobbers one beyond the end of the input array.

I wonder if we should bother about what to do if size == 0. Mine crashes spectacularly, which is a Good Thing.

Getting within 15% of memcpy is pretty damn good, in my estimation. Of course I didn't read Linus's version, or OpenBSD's; that would be cheating, and I would be tainted besides. Of course now that I have been told, via cleanroom methods, I can adjust mine to be equally fast, and maybe (one can hope) actually identical to both Linus's and OpenBSD's.

Off by one?

Posted May 30, 2003 16:28 UTC (Fri) by tjc (guest, #137) [Link]

Getting within 15% of memcpy is pretty damn good, in my estimation. Of course I didn't read Linus's version, or OpenBSD's; that would be cheating, and I would be tainted besides. Of course now that I have been told, via cleanroom methods, I can adjust mine to be equally fast, and maybe (one can hope) actually identical to both Linus's and OpenBSD's.

You're probably going to have to copy more than one char at a time to match memcpy() for speed.

strlcpy()

Posted Jun 5, 2003 14:32 UTC (Thu) by djm (subscriber, #11651) [Link]

Rubbish - the OpenBSD strlcpy has NO advertising clause. It is licensed under an ISC license, which is about as liberal as you can get.

It is licensed that was so people don't have to make stupid errors when reinventing the wheel.

strlcpy()

Posted Jun 10, 2003 19:22 UTC (Tue) by hogsberg (guest, #11751) [Link]

Forgive me for nit-picking, but the correct term is NUL-terminated. NULL is the special pointer in C, NUL is the ASCII character with integer value 0 used for terminating strings.

Kristian


Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds