LWN.net Logo

Format string vulnerabilities

By Jake Edge
February 1, 2012

A recent sudo advisory described a "format string vulnerability" that could be used for privilege escalation. Since sudo runs as setuid-root, that means that it could potentially be used by a regular user—not just one listed in the /etc/sudoers file—to compromise the system. As with many security flaws, format string vulnerabilities are the result of improper handling of user-supplied input. Given this latest report, it's probably worth taking a look at how these kind of vulnerabilities come about.

For those who aren't C programmers, a little background may be in order. The standard C library function for printing things to stdout is printf()—other functions in the same family can be used to print to stderr, character buffers, or other files. That function takes a string as its first argument which can contain special formatting characters that describe the types of the rest of the arguments. For example:

    printf("hello, world\n");
    printf("%s\n", "hello, world");
    printf("%s, %s\n", "hello", "world");
would all print the canonical string to stdout. The "%s" is the format specifier for a string, so the function expects the corresponding argument to be a pointer to a null-terminated array of characters.

Members of the printf() family use the "varargs" (variable arguments) facility of the C language to take an arbitrary number of arguments. When the formatting string is parsed, values are popped off the stack in the order they are listed. Those values are expected to be there by the function, but, given the existing ABI, the compiler does not (in fact cannot) enforce that they be placed there by caller. That's where the problem can occur.

In the easy case, compilers can and do warn when there is a mismatch between the format string and arguments. A call like:

    printf("hello, %s\n");
will cause a warning if the warning level is set high enough (like -Wall for GCC). But those kinds of problems are relatively easily found. A trickier problem occurs with something like:
    printf(str);
which is perfectly legal as long as str contains no formatting characters. If it does, however, the function will happily pop things off the stack that don't correspond to the arguments in that formatting string. For GCC, the "-Wformat -Wformat-nonliteral" flags can be used to detect this kind of thing. In the "best" case, having format specifiers in str will lead to a program crash, in the worst, it could end up executing code. If str comes from user-supplied input, an attacker may be able to arrange just the right formatting string to execute code of their choosing.

That may be bad enough for a program run as an unprivileged user (as the attacker's code might be the equivalent of "rm -rf $HOME"), but it is far worse if the program has root privileges as sudo does. According to Wikipedia, format string bugs were noted in 1990, but were not recognized as a security problem until a researcher auditing proftpd reported a way to exploit the bug. That exploit used the "%n" format, which stores the number of characters printed so far to an integer pointer it pops off the stack. By arranging just the right format string, the exploit would overwrite the current user ID.

In the sudo case, the program name (which is stored in argv[0] for C programs) was being printed as part of an error message. As the advisory from the finder describes, the program name was "printed" into a buffer (using a variant of sprintf()), and that buffer was then handed off to a vfprintf() as the format string. That meant that the user-controlled program name—which could certainly contain format specifiers—was used as the format string for the vfprintf(). The fix for sudo is to ensure that the program name is printed with a "%s" specifier in the final print statement, rather than building it into the earlier buffer.

How can the user control the program name, especially for a setuid binary like sudo? That's not very hard either:

    $ ln -s /usr/bin/sudo %n

The sudo advisory notes that building sudo with -D_FORTIFY_SOURCE=2 will prevent these kinds of exploits, though the advisory from the finder notes an article in Phrack that may make it possible to bypass that protection.

The problem in sudo was introduced relatively recently, for version 1.8.0 released at the end of February 2011. It has now been fixed in 1.8.3p2 and affected distributions are starting to get updates out. These kinds of bugs are yet another lesson in the need for great care when handling user-controlled input.


(Log in to post comments)

Make the stack less fragile

Posted Feb 2, 2012 11:03 UTC (Thu) by epa (subscriber, #39769) [Link]

Why not a different calling convention? Push the arguments onto the stack followed by an integer saying how many arguments there are. Then it will be impossible to mistakenly pop off too many or too few.

This would slow down function calls a little bit, but it could be used as an alternative calling convention for varargs functions only. There would still be format string vulnerabilities, but this particular class of them would go away.

Make the stack less fragile

Posted Feb 2, 2012 12:53 UTC (Thu) by etienne (subscriber, #25256) [Link]

You can have a lot of different and incompatible calling conventions, like pushing the number of parameters (as a 32 or 64 bits number), pushing the size of the parameter area in bytes, telling which parameters are const pointers and which are pointers to variables, ...
That will often result in more security, but also slower software; and in C the simplest system has been chosen, which allows you to "bolt-on" more complex ones if you wish.

Make the stack less fragile

Posted Feb 2, 2012 16:50 UTC (Thu) by epa (subscriber, #39769) [Link]

I think that the C standard doesn't specify any particular calling convention? Implementers are free to implement stack frames and function calls however they like.

Make the stack less fragile

Posted Feb 3, 2012 20:15 UTC (Fri) by nix (subscriber, #2304) [Link]

Yes, but in practice for existing platforms such things are frozen by the needs of interoperability.

Make the stack less fragile

Posted Feb 3, 2012 20:15 UTC (Fri) by nix (subscriber, #2304) [Link]

Doesn't help. Now all you need to do is buffer-overrun that integer and you can pop as many args as you like, or as few. You've just given the attacker control over the stack frame in ways they can only dream of right now, so they can force functions to execute with other functions' local variables overlapping their own, or with garbage, or with their own local variables' values shifted into other variables partially or completely. A whole new class of security holes!

Make the stack less fragile

Posted Feb 5, 2012 10:38 UTC (Sun) by epa (subscriber, #39769) [Link]

Not so: clearly the generated code for a function such as foo(int a, int b) would first check that exactly two items are on the stack. If the integer at the top of the stack is not 2, then something has gone very wrong.

The only case where this integer number of arguments would change the program's behaviour (as opposed to redundantly stating what is already expected to be the case) is for varargs functions. And in those cases you validate the number of arguments on the stack against an expected number. If the format string is "%d%d" but there are 3 values on the stack, again something is wrong.

To cause an exploit the attacker must both manipulate the format string and somehow overwrite the number-of-arguments value at the top of the stack. It is no longer possible to take too many or too few values from the stack because of a format string vulnerability or other varargs bug.

Make the stack less fragile

Posted Feb 5, 2012 21:19 UTC (Sun) by nix (subscriber, #2304) [Link]

Well, yes, the attack vector is exactly that: overwrite the num-of-args value, and all bets are off.

Make the stack less fragile

Posted Feb 6, 2012 11:37 UTC (Mon) by epa (subscriber, #39769) [Link]

Perhaps I don't understand what you mean. To attack a varargs function such as printf() you would need to attack *both* the format string vulnerability and smash the stack somehow to overwrite the number-of-args value. This is harder than just exploiting the format string without stack protection.

So if printf() gets the format string "%d %d" but number-of-args != 2, it aborts. You would need to find a format string vulnerability *and* a stack-overwriting exploit to change the number-of-args value.

If you overwrite just the number-of-args value at the top of the stack, this is merely a denial of service attack for a call foo(a, b). It would not cause foo() to somehow take three arguments instead, because the number of args to pop off the stack is compiled in. Given all the other fun and games you can get by overwriting values on the stack (the return address in particular), I don't think that a number-of-args value presents a juicy target.

How come they did not get a warning?

Posted Feb 2, 2012 15:20 UTC (Thu) by rvfh (subscriber, #31018) [Link]

Apparently Ubuntu forces -Wformat-security by default, as documented here, meaning that GCC will complain if I do that:

$ cat main.c
#include <stdio.h>

int main(void) {
        const char* str = "Hello, world";
        printf(str);
        return 0;
}
$ gcc main.c
main.c: In function ‘main’:
main.c:5:2: warning: format not a string literal and no format arguments [-Wformat-security]

I suppose Fedora and SuSE do the same, so my question is: what do the sudo developers compile on? And which warnings do they use? Or more importantly: how do they react when a warning appears? In my experience, warnings have a strong smell of mushrooms bugs.

How come they did not get a warning?

Posted Feb 3, 2012 12:42 UTC (Fri) by jwakely (subscriber, #60262) [Link]

Strangely enough, the sudo code wasn't doing a "hello world" toy example like the article and your one. Try this, which is closer to the real code
#include <stdarg.h>
#include <stdio.h>

void
sudo_debug(const char* progname, const char *fmt, ...)
{
    va_list ap;
    char fmt2[200];
    sprintf(fmt2, "%s: %s\n", progname, fmt);
    va_start(ap, fmt);
    vfprintf(stderr, fmt2, ap);
    va_end(ap);
}
Still get a warning?

How come they did not get a warning?

Posted Feb 5, 2012 21:35 UTC (Sun) by k8to (subscriber, #15413) [Link]

I naively tend to think this pattern should be not used. I don't see much call for par-formatted strings that are later formatted again.

If you believe you have a fixed string, you can do the moral equivalent of

printf("%s", str);

which is what I do in my code.

If you need to build out a string piecemeal, you can build and append to a string without formatting it more than once.

If you need to do some fairly sophisticated templating functionality that really requires multiple passes of interpretation, there are libraries that are designed for that purpose. Although you should think hard if you really need that; typically you don't.

How come they did not get a warning?

Posted Feb 11, 2012 23:27 UTC (Sat) by cras (guest, #7000) [Link]

clang gives a warning with it.

In 2011?

Posted Feb 2, 2012 16:27 UTC (Thu) by NAR (subscriber, #1313) [Link]

I always thought that just as we no longer use gets() in C code, nobody would write printf(variable). I mean it's kind of reflex to write printf("", ), then fill the rest of the stuff.

In 2011?

Posted Feb 2, 2012 22:11 UTC (Thu) by iabervon (subscriber, #722) [Link]

Nobody does write printf(variable). Unfortunately, they do write vprintf(fmt, args), where fmt is the concatenation of some strings and an appropriate format string. The goals were: call a single stdio function in order to avoid interleaving your message with other output (potentially from a subprocess); print a bit of identifying information at the beginning; and print a caller-supplied format string with its appropriate arguments. There's not really a good way to do all of these together, since you can't insert things in a va_list.

In 2011?

Posted Feb 8, 2012 3:11 UTC (Wed) by IkeTo (subscriber, #2122) [Link]

Format string with character not controlled by the application is universally bad. I think it is more reasonable to vsnprintf the application-supplied information to a fixed sized buffer, and printf the result with the user-supplied argv[0] with a fixed fmt. This limits the length of the output to the buffer size, but that is probably good for a log file anyway. The double-copy speed overhead is probably irrelevant.

In 2011?

Posted Feb 3, 2012 5:27 UTC (Fri) by geofft (subscriber, #59789) [Link]

The subtlety here is that the sudo code wanted to prefix the application's name to the message, and the message came from an arbitrary format. So you have code which looks very much like the following pseudocode:
sudo_debug(char *fmt, ...) {
    char *real_fmt = asprintf("%s: %s", argv[0], fmt);
    printf(real_fmt, ...);
}
It's actually not the strangest thing in the world to pass a non-constant argument to the printf format. For instance, the following would be totally reasonable:
sudo_debug(char *fmt, ...) {
    char *real_fmt = asprintf("sudo: %s", argv[0], fmt);
    printf(real_fmt, ...);
}

In 2011?

Posted Feb 3, 2012 12:42 UTC (Fri) by NAR (subscriber, #1313) [Link]

If the format string comes from (potentially untrusted) source, then we're screwed anyway, that has to be verified along with the arguments. And probably that is the right time to generate the output string too (at least the parts coming from the user).

Format string vulnerabilities

Posted Feb 2, 2012 17:59 UTC (Thu) by csd (subscriber, #66784) [Link]

puts has been in place for this for a long time. Since printf actually... There is absolutely no need to ever use printf(str) to just print a string without interpreting it. It's faster too.

Format string vulnerabilities

Posted Feb 2, 2012 22:22 UTC (Thu) by khim (subscriber, #9252) [Link]

How can it be faster? It generates identical code:
$ echo 'void foo() { printf("%s\n", "bar"); }' | gcc -S -O2 -xc - -o- 
	.file	""
	.section	.rodata.str1.1,"aMS",@progbits,1
.LC0:
	.string	"bar"
	.text
	.p2align 4,,15
.globl foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	movl	$.LC0, %edi
	jmp	puts
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
	.section	.note.GNU-stack,"",@progbits

Format string vulnerabilities

Posted Feb 2, 2012 23:16 UTC (Thu) by csd (subscriber, #66784) [Link]

I meant that the *implementation* of puts is faster than printf, as puts doesn't have to parse through the first param like printf does. In your example, gcc simply optimized the code into calling puts instead of printf, which it can only do for a very limited number of cases (e.g. with a fixed constant as the 1st param, which is not the case that this article covers). In this very similar example, you can see that the generated code is quite different and will be slower to run:

$ echo 'void foo() { extern char *str; printf(str, "bar"); } ; char * str = "%s\n";' | gcc -S -O2 -xc - -o-
.file ""
<stdin>: In function ‘foo’:
<stdin>:1: warning: incompatible implicit declaration of built-in function ‘printf’
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "bar"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
movq str(%rip), %rdi
movl $.LC0, %esi
xorl %eax, %eax
jmp printf
.cfi_endproc
.LFE0:
.size foo, .-foo
.globl str
.section .rodata.str1.1
.LC1:
.string "%s\n"
.data
.align 8
.type str, @object
.size str, 8
str:
.quad .LC1
.ident "GCC: (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5"
.section .note.GNU-stack,"",@progbits

So I'll restate my original statement to: "... In most cases, it's faster too"

Format string vulnerabilities

Posted Feb 3, 2012 5:30 UTC (Fri) by geofft (subscriber, #59789) [Link]

But they weren't printing a literal string, they were trying to modify a format string to prepend the name of the program, and call printf again. They did correctly pass the program name to a "%s", they just passed the result of that to another printf-family call, which caused the program name to be interpreted at that point.

Format string vulnerabilities

Posted Feb 3, 2012 12:05 UTC (Fri) by AndreE (subscriber, #60148) [Link]

Why is it neccessary for sudo to call argv[0] in any case? Surely it is rarely if ever given an alias on users' systems

Format string vulnerabilities

Posted Feb 5, 2012 22:25 UTC (Sun) by k8to (subscriber, #15413) [Link]

It's best practice to use argv0 when generating error messages, so that your message reflects the program that was executed, even if it was renamed.

It's better than expecting that the program's name will always match the compiled-in name.

Sure, you get a new input vector, but that text is in your program anyway, and will be evaluated by some code you didn't write. I'm not convinced this is a real problem. Patterns where you allow program generated text to be used as a format string really should be either not used or very very carefully audited.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds