By Michael Kerrisk
July 18, 2012
Adding the
strlcpy() function
(and the related
strlcat() function)
has been a perennial request
(1,
2,
3)
to the
GNU C library (glibc)
maintainers,
commonly supported by a statement that
strlcpy()
is superior to the existing alternatives.
Perhaps the earliest request to add these
BSD-derived functions to glibc took the form of
a patch submitted in 2000
by a fresh-faced Christoph Hellwig.
Christoph's request was rejected, and subsequent requests have similarly been rejected (or ignored).
It's instructive to consider the reasons why
strlcpy()
has so far been rejected,
and why it may well not make its way into glibc in the future.
A little prehistory
In the days before programmers considered that someone else might want to
deliberately subvert their code, the C library provided just:
char *strcpy(char *dst, const char *src);
with the simple purpose of copying the bytes from the
string pointed to by
src
(up to and including the terminating null byte)
to the buffer pointed to by
dst.
Naturally, when calling
strcpy(),
the programmer must take care that the bytes being
copied don't overrun the
space available in the buffer pointed by
dst.
The effect of such buffer overruns
is to overwrite other parts of a process's memory, such as
neighboring variables,
with the most common result being to corrupt data
or to crash the program.
If the programmer can with 100% certainty predict
at compile time the size of the
src
string,
then it's possible (if unwise) to preallocate a suitably sized
dst
buffer and omit any argument checks before calling
strcpy().
In all other cases, the call should be guarded with a suitable
if
statement to check the size of its argument.
However, strings (in the form of input text)
are one of the ways that humans interact with computers,
and thus quite commonly the size of the
src
string is controlled by the user of a program, not the program's creator.
At that point, of course, it becomes essential for every call to
strcpy()
to be guarded by a suitable
if
statement:
char dst [DST_SIZE];
...
if (strlen(src) < DST_SIZE)
strcpy(dst, src);
(The use of
<
rather than
<=
ensures that there's at least one byte extra byte available for the null terminator.)
But it was easy for programmers to omit such checks
if they were forgetful, inattentive, or cowboys.
And later, other more attentive programmers realized that by
carefully controlling what was written into the overflowed buffer,
and overrunning into more exotic places such as
function
call return addresses stored on the stack,
they could do
much
more interesting things
with buffer overruns than simply crashing the program.
(And because code tends to live a long time,
and the individual programmers creating it can be slow to
to learn about the sharp edges of the tools they use,
even today buffer overruns remain one of the
most
commonly reported vulnerabilities
in applications.)
Improving on strcpy()
Prechecking the arguments of each call to
strcpy()
is burdensome.
A seemingly obvious way to relieve the programmer
of that task was to add an API
that allowed the caller to inform the library
function of the size of the target buffer:
char *strncpy(char *dst, const char *src, size_t n);
The
strncpy()
function is like
strcpy(),
but copies at most
n
bytes from
src
to
dst.
As long as
n
does not exceed the space allocated in
dst,
a buffer overrun can never occur.
Although choosing a suitable value for
n
ensures that
strncpy()
will never overrun
dst,
it turns out that
strncpy()
has problems of its own.
Most notably,
if there is no null terminator in the first
n
bytes of
src,
then
strncpy()
does not place a null terminator after the bytes copied to
dst.
If the programmer does not check for this event,
and subsequent operations expect a null terminator to be present,
then the program is once more vulnerable to attack.
The vulnerability may be more difficult to exploit than a buffer overflow,
but the security implications can be just as severe.
One iteration of API design didn't solve the problems, but perhaps a further one can…
Enter,
strlcpy():
size_t strlcpy(char *dst, const char *src, size_t size);
strlcpy()
is similar to
strncpy()
but copies at most
size-1
bytes from
src
to
dst,
and always adds a null terminator following the bytes copied to
dst.
Problems solved?
strlcpy()
avoids buffer overruns and ensures that the output string is null terminated.
So why have the glibc maintainers obstinately refused to accept it?
The essence of the argument against
strlcpy()
is that it fixes one problem—sometimes failing to terminate
dst
in the case of
strncpy(),
buffer overruns in the case of
strcpy()—while leaving another:
the loss of data that occurs when the string copied from
src
to
dst
is truncated because it exceeds
size.
(In addition, there is still
an unusual corner case
where the unwary programmer can find that
strlcat(),
the analogous function for string concatenation,
leaves
dst
without a null terminator.)
At the very least,
(silent) data loss is undesirable to the user of the program.
At the worst, truncated data can lead to security issues that
may be as problematic as buffer overruns,
albeit probably harder to exploit.
(One of the nicer features of
strlcpy()
and
strlcat()
is that their return values do at least facilitate the detection of
truncation—if the programmer checks the return values.)
All of which brings us full circle:
to avoid unhappy users and security exploits,
in the general case even a call to
strlcpy()
(or
strlcat())
must be guarded by an
if
statement checking the arguments,
if the state of the arguments
can't be predicted with certainty in advance of the call.
Where are we now?
Today,
strlcpy()
and
strlcat()
are present on many versions of UNIX
(at least Solaris, the BSDs, Mac OS X, and Irix),
but not all of them (e.g., HP-UX and AIX).
There are even implementations of these functions
in the Linux kernel
for internal use by the kernel code.
Meanwhile, these functions are not present in glibc,
and were rejected for inclusion in
the POSIX.1-2008 standard,
apparently for similar reasons to their rejection from glibc.
Reactions among core glibc contributors on the topic of including
strlcpy()
and
strlcat()
have been varied
over the years.
Christoph Hellwig's early patch was rejected
in the then-primary maintainer's inimitable style
(1 and
2).
But reactions from other glibc developers have been more nuanced,
indicating,
for example,
some willingness to accept the functions.
Perhaps most insightfully,
Paul Eggert notes
that even when these functions are provided
(as an add-on packaged with the application),
projects such as OpenSSH,
where security is of paramount concern,
still manage to either misuse the functions (silently truncating data)
or use them unnecessarily (i.e., the traditional
strcpy()
and
strcat()
could equally have been used without harm);
such a state of affairs does not constitute a
strong argument for including the functions in glibc.
The appearance of an
embryonic
entry on this topic in the glibc FAQ,
with a brief rationale for why these functions are currently excluded,
and a note that "gcc -D_FORTIFY_SOURCE"
can catch many of the errors that
strlcpy()
and
strlcat()
were designed to catch,
would appear to be something of a final word on the topic.
Those that still feel that these functions should be in glibc
will have to make do with the
implementations provided in
libbsd
for now.
Finally,
in case it isn't obvious by now,
it should of course be noted that the root of this problem lies
in the C language itself.
C's native strings are not
managed strings
of the style natively provided in more modern languages such as Java, Go, and D.
In other words, C's strings have no notion of bounds checking
(or dynamically adjusting a string's boundary) built into the type itself.
Thus, when using C's native string type,
the programmer can never entirely avoid the task of
checking string sizes when strings are manipulated,
and no replacements for
strcpy()
and
strcat()
will ever remove that need.
One might even wonder if the original C library implementers were clever
enough to realize from the start that
strcpy()
and
strcat()
were sufficient—if it weren't for the fact that they also gave us
gets().
(
Log in to post comments)