GMP and assert()
A report of a potential security problem in the GNU Multiple Precision Arithmetic (GMP) library was met with a mixed reaction, from skepticism to responses verging on hostility, but the report ultimately raised a question worth pondering. What role should assertions (i.e. calls to the POSIX assert() macro) play in error handling? An assertion that fails leads to a process exit, which may not be what a developer calling into a library expects. Unexpected behavior is, of course, one step on a path that can lead to security holes.
Jeffrey Walton posted
a message with "Asserts considered harmful (or GMP spills its
sensitive information)
" as its subject line to the oss-security mailing
list right at the end of 2018. He
noted that GMP uses
assert() for error checking, which could lead to a number of
undesirable effects. Software calling GMP might well be handling
encryption keys or other sensitive information, which could be exposed via
core dumps or other means (e.g. error-reporting services) due to the abort()
call made in response to the failing assertion.
As the subject of the mail might indicate, the tone of the message was a tad combative and, perhaps, a bit condescending. That may help explain some of the reactions to it. Walton's message begins as follows:
Many programs can safely use assert to crash a program at runtime. However, the [prerequisite] is, the program cannot handle sensitive information like user passwords, user keys or sensitive documents.
High integrity software, like GMP and Nettle, cannot safely use an assert to crash a program.
He goes on to describe what happens when an assertion fails and how that can lead to sensitive information leaking. He also includes an example that uses the Nettle cryptographic library (which calls into GMP) to trigger an assertion, crash, and core dump. It turns out that the assertion is actually caused by a bug in Nettle that has since been fixed. Walton built Nettle and GMP with -DNDEBUG, which is meant to disable assert(). As noted above, that did not disable assert() in GMP, but did work for Nettle. However, that tickled a bug in one of the Nettle test cases, which caused the GMP assertion to fail. Discovering that led to some dismissive responses, but the larger point still stands: should applications expect libraries to abort()?
Most in the thread argued that the danger of trying to continue past a failed assertion could be worse than the information leak. Security-sensitive programs should already be ensuring that no core dumps are produced and that any abort()-reporting services are disabled. In addition, as Vincent Lefèvre pointed out, there are plenty of other ways for a program to crash (e.g. segmentation fault) that could lead to the same information leak. So programs handling sensitive information need to be prepared for that in any case.
The crux of Walton's argument is that the assertions would be better handled by returning an error rather than crashing (or attempting to crash, since programs could be set up to handle the SIGABRT). That may be true but it would change the API and, perhaps, slow down the library, Lefèvre said.
The developer of Nettle, Niels Möller, agreed:
"Crashing in a controlled fashion may also be *more* secure that
continuing execution with undefined results. Depending on circumstances,
of course.
" He went on to say that Walton's perspective differed
from his own:
In the final analysis, developers of programs handling sensitive information need to be more careful than other developers. One could argue that those developers should be familiar with the behavior of any libraries that they use, so that assertion failures are not unexpected. Beyond that, as GMP developer Torbjörn Granlund pointed out, the API of a library is determined by its developers; other projects are free to use the library—or not—at their discretion.
| Index entries for this article | |
|---|---|
| Security | Vulnerabilities/Library |
Posted Feb 27, 2019 22:04 UTC (Wed)
by BenHutchings (subscriber, #37955)
[Link] (5 responses)
Posted Feb 27, 2019 22:14 UTC (Wed)
by roc (subscriber, #30627)
[Link] (4 responses)
Posted Feb 27, 2019 23:44 UTC (Wed)
by gdt (subscriber, #6284)
[Link] (3 responses)
Even if the secret is protected with both MADV_DONTDUMP and mlock() the library may copy that memory. That copying might not be unreasonable -- to access data in modern CPUs is to copy it, at the very least into CPU caches and registers.
We saw the same thing recently with password databases, where graphics libraries retained the contents of previously-displayed text and so passwords were still in RAM even if the password database had erased its own storage. In turn those graphics libraries may feel that free()ing data means it has passed from their responsibility.
Posted Feb 28, 2019 6:14 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
Posted Feb 28, 2019 7:33 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (1 responses)
That is an interesting new twist in the display-password-or-display-dots question. Accessing graphics memory is probably more of a threat than shoulder surfing these days. (I would be interested to know if it really is a realistic threat, as in worth the effort for an attacker.) A flag marking a graphics surface as sensitive or something on those lines might help.
Posted Feb 28, 2019 22:23 UTC (Thu)
by flussence (guest, #85566)
[Link]
It's absolutely trivial to demonstrate pulling unsanitized chunks of VRAM out of a GPU too: just connect one tiny screen and one huge one, switch VT (linux fbcon shrinks to fit the smaller screen, leaving the bottom right of the larger one uninitialised) and dump the framebuffer using whatever means available. The empty space is black on kernel 4.20 so I assume it's fixed now, but for many years this wasn't the case.
Posted Feb 28, 2019 3:59 UTC (Thu)
by mm7323 (subscriber, #87386)
[Link]
As already said, applications can use prctl() if too sensitive to core-dump, and that covers all cases not just abort () triggers
Posted Feb 28, 2019 9:51 UTC (Thu)
by smurf (subscriber, #17840)
[Link]
Posted Feb 28, 2019 14:00 UTC (Thu)
by devkev (subscriber, #74096)
[Link] (34 responses)
The whole point of a library is to be embedded inside some larger process. When you create a process, it's not by running a library, it's by running a program. So it should be the program that decides when that process ends. The lifecycle of that process is none of the library's business, and the library should not do anything to interfere with it. It is entirely inappropriate for a library to unilaterally pull the rug out from under the entire process by early terminating it.
If the library has entered a state where it can no longer perform useful work, then it can just tell the application that. It can return a special "fatal" error response, and if it's completely hosed, then it could set a flag so that all other library calls immediately return the same fatal error. This flag would only be cleared when the problematic state has been fixed (eg. by a special function call which re-inits the entire library, clearing out all its internal state).
For example, the program might be using the library for some optional functionality. In this case, the program could be happy to continue on without the library, using some fallback/alternate instead. Only the program can make that decision, and the library should not even try to second-guess it.
It is completely reasonable for developers to expect that libraries won't go and call exit(), or kill(getpid(), SIGKILL), or similar. Just like I expect random root daemons won't go and reboot my whole system just because they've encountered some problem that they consider severe.
Whatever happened to the "catch low, handle high" error handling philosophy? Libraries have no knowledge of the surrounding context in which they are running - ie, the rest of the program. But this is precisely why they should not be handling such errors, only reporting them. The caller can bubble the error up to a level that is best placed to handle it appropriately.
Posted Feb 28, 2019 14:59 UTC (Thu)
by pizza (subscriber, #46)
[Link] (7 responses)
We both know that "can" actually means "will ignore completely" 99.7% of the time.
Posted Feb 28, 2019 15:28 UTC (Thu)
by NAR (subscriber, #1313)
[Link] (6 responses)
Posted Feb 28, 2019 15:50 UTC (Thu)
by intgr (subscriber, #39733)
[Link] (5 responses)
There are some frameworks that handle assertion failures by logging and returning: GNOME things like glib, GTK, etc.
Have you ever tried running a GNOME application in a terminal? The majority of them spew errors and warnings left and right; developers seem to just ignore the errors because they are so easy to ignore. No-one bothers reporting them as bugs, although many of them probably are.
As far as I can tell, the whole industry has been slowly moving towards a more "fail fast, fail loud" approach. I'm surprised that it has suddenly became controversial.
Posted Feb 28, 2019 16:41 UTC (Thu)
by NAR (subscriber, #1313)
[Link] (3 responses)
Posted Feb 28, 2019 20:15 UTC (Thu)
by mageta (subscriber, #89696)
[Link] (1 responses)
Posted Mar 3, 2019 13:41 UTC (Sun)
by hadess (subscriber, #24252)
[Link]
There are 2 type of "errors" that crash the entire process in a GLib program: a call to g_error(), and memory allocation failure. Both of those are documented.
Posted Mar 1, 2019 18:31 UTC (Fri)
by notriddle (subscriber, #130608)
[Link]
Posted Mar 3, 2019 13:45 UTC (Sun)
by hadess (subscriber, #24252)
[Link]
They're easy to ignore because most "GNOME applications" aren't launched from a terminal and people rarely read all the errors in the journal when apps otherwise function.
And application developers are thankful for the bug reports, when they happen.
Note that if you feel courageous, you could set "G_DEBUG=fatal_warnings" and all the warnings would make the app crash. Send backtraces!
Posted Feb 28, 2019 16:00 UTC (Thu)
by sorokin (guest, #88478)
[Link] (19 responses)
Although technically correct and this is what should be done for most errors by default, this behavior is not appropriate for some errors. Let me present an example.
Example 1. Let's say we write a function sqrt. sqrt is not applicable for negative numbers. OK let's make this function return error code in case the argument is negative (EDOMAIN). Now, let's write function hypot using this sqrt. OK, mathematically hypot always calls sqrt with non-negative argument. What should hypot do when sqrt returns EDOMAIN? Should it propogate the error to its caller? If the error is propagated to the caller of hypot, what will be the name of the error? "my internal call to sqrt said that its argument is negative"?
I claim that hypot should not propagate this error to the caller. hypot should assert that sqrt never returns EDOMAIN. That's the example why asserts are useful and not all errors should be propagated to callers.
Example 2. We can go further. If we look at typical usages of sqrt we'll see that most of the time (in good code base perhaps always) people use it with only non-negative arguments. As the result most users just assert that sqrt returns EDOMAIN. Can we move this assert into sqrt. Yes, we can and as the result the signature of sqrt simplifies:
error_code sqrt(number in, number& out); // old: wide contract (sqrt applicable for all arguments)
With new signature function is much easier to use and can be more efficient (as it has fewer checks).
That's the second example why asserts can be useful (to make code simpler).
Posted Feb 28, 2019 16:10 UTC (Thu)
by sorokin (guest, #88478)
[Link] (17 responses)
There are two types of errors:
It is a pity that people use same word "error" for both of these conditions.
1 can and must be handled using "propagate error to caller policy".
Posted Feb 28, 2019 20:30 UTC (Thu)
by HenrikH (subscriber, #31152)
[Link] (1 responses)
Or when malloc() returns NULL on a kernel with overcommit (the default), even logging the error would require more memory to be allocation and thus also fail.
Posted Feb 28, 2019 22:42 UTC (Thu)
by devkev (subscriber, #74096)
[Link]
This is not true. First of all, maybe the condition which caused malloc() to fail was transient, and subsequent mallocs will succeed. But even if not, the proper behaviour in this case is to pre-allocate (or use static allocation) a reasonable sized buffer to use for this logging. This is why things like sigaltstack() exist, for example.
Posted Feb 28, 2019 22:34 UTC (Thu)
by devkev (subscriber, #74096)
[Link] (7 responses)
This is basically saying "Don't want to have to check this, because it cannot possibly have failed" - famous last words.
> 2 must be checked with asserts.
Of course libraries should check for internal consistency, and that invariants are actually true. But such a condition can only be (at most) fatal for the library, it absolutely does not follow that it necessarily has to be fatal for the whole program. A library which presumes otherwise is making unnecessarily selfishly assumptions about the program in which it's running. This means that for libraries, at least, assert(3) is simply the wrong tool for this job.
hypot should not propagate sqrt's EDOMAIN, because the fact that hypot calls sqrt is an implementation detail. hypot returning EDOMAIN is for situations where hypot has been called with invalid or improper arguments. Instead, when hypot wants to check its internal consistency, namely that the value it got back from sqrt is sensible and can be safely used, if it finds that this is not the case it should just immediately return EUNKNOWN or EINTERNAL.
Here's an analogy. Someone comes up to you on the street and asks if you have the time. You say "Yes" and look at your wrist to see your watch, and are surprised that it's not there. You don't know why, maybe you left it at home, maybe the strap broke and it fell off, or maybe someone stole it. Do you punch the person in the face and run away? Or do you say "Sorry, I expected to be able to tell you the time, but it turns out that I can't and I don't know why"?
Posted Feb 28, 2019 23:51 UTC (Thu)
by mpr22 (subscriber, #60784)
[Link]
In memory-unsafe land (which is where a lot of libraries exist), that depends on what the invariant is and what the implications of the invariant being wrong are.
> You say "Yes" and look at your wrist to see your watch, and are surprised that it's not there.
If I'm surprised at that point that my watch isn't there, I want to see a neurologist pronto, because it means the sense of touch in my left arm isn't working properly :P
(100g of close-fitting stainless steel bracelet tends to make its presence felt.)
Posted Mar 6, 2019 10:27 UTC (Wed)
by renox (guest, #23785)
[Link] (5 responses)
Posted Mar 6, 2019 12:09 UTC (Wed)
by excors (subscriber, #95769)
[Link]
That sounds a lot like exceptions. Errors get propagated up the call stack until someone handles the error, and if nobody handles it then the application crashes. (Of course exceptions introduce a whole new set of problems so they're just as bad as every other approach to error handling.)
Posted Mar 7, 2019 16:53 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (3 responses)
Rust and other languages with algebraic data types has this. You get a `Result<T, E>` where in order to access the `T` (the wanted result), you either need to use `.unwrap()` which turns into a panic if an error did occur or you have to pattern match and then there's an obvious error codepath that exists in the code.
Posted Mar 8, 2019 8:51 UTC (Fri)
by gracinet (guest, #89400)
[Link]
To be complete, that leaves the case where one does not care about the result (type T in your example), and simply ignores it.
Of course the Rust compiler has a warning for that.
Posted Mar 14, 2019 9:26 UTC (Thu)
by jezuch (subscriber, #52988)
[Link] (1 responses)
Posted Mar 14, 2019 13:32 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
git push
Note that this particular error is now better worded in that it mentions the object id that it is missing instead.
Posted Feb 28, 2019 22:50 UTC (Thu)
by mgb (guest, #3226)
[Link] (6 responses)
What if the application prefers to finish landing the plane before figuring out how the library managed to violate its own invariant?
Posted Mar 1, 2019 2:59 UTC (Fri)
by pizza (subscriber, #46)
[Link] (5 responses)
If application in question is flight-critical avionics, then every line of source code (including all libraries and compile-time settings) will have been fully audited a-priori, rendering your question moot.
That isn't to say that unanticipated issues can't happen.. indeed, safety-critical systems are typically designed under the assumption that unexpected issues _will_ occur. Consequently there are requirements that the overall system must be able to fail (and recover) gracefully.
Posted Mar 1, 2019 7:23 UTC (Fri)
by mgb (guest, #3226)
[Link] (4 responses)
Unmanned space missions have been lost due to typos. A navy cruiser (Yorktown) lost its computer network and propulsion because of an unexpected zero. And at least one F-22 crashed due to a software error.
Posted Mar 1, 2019 14:27 UTC (Fri)
by pizza (subscriber, #46)
[Link] (3 responses)
So if your point is auditing doesn't catch everything (especially in light of incomplete or incorrect specifications) then sure, we're in complete agreement.
If it's not, then I have no idea what point you are trying to make.
Posted Mar 1, 2019 16:01 UTC (Fri)
by mgb (guest, #3226)
[Link] (2 responses)
Good. Then let's allow the application do it's best to finish landing the plane before we stop to figure out how the library managed to violate its own invariant.
Posted Mar 1, 2019 18:56 UTC (Fri)
by pizza (subscriber, #46)
[Link] (1 responses)
Do I have to point out that when it comes to avionics, there is no distinction made between "the library" and "the application"?
Meanwhile, there have been quite a few situations where "the application tried to finish landing the plane" straight into the ground.
Posted Mar 2, 2019 10:02 UTC (Sat)
by devkev (subscriber, #74096)
[Link]
No. It wasn't meant literally. It was meant to illustrate the idea that the application might have things that it would prefer to do on error conditions, and those things can't be done if the library's narrow view causes the whole application to be killed.
> Meanwhile, there have been quite a few situations where "the application tried to finish landing the plane" straight into the ground.
And innumerable more where the remaining systems - affected by the problem as well as unaffected - were able to be used to land the plane safely. A faulty altimeter doesn't shut down the entire plane, because the rest of the plane is _probably_ fine. But in this analogy, that is exactly what happens when a library calls abort/assert.
Posted Mar 13, 2019 22:05 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
Dangerous statement :-) What about i?
Cheers,
Posted Mar 1, 2019 22:17 UTC (Fri)
by rweikusat2 (subscriber, #117920)
[Link] (5 responses)
At the very least, any assert in a library out to be part of the documented interface. Otherwise, there's no way anyone can safely use the library for anything where "sudden failure for an unknown reason" is not a sensible option.
Posted Mar 2, 2019 3:35 UTC (Sat)
by rra (subscriber, #99804)
[Link] (4 responses)
You can argue that was bad design, but if the library is in widespread use already, the mistake has already been made, and one still has to decide how to move forward from here. If the library is currently asserting on internal consistency errors or API violations, and you change it to to return an error from functions that previously never returned an error, that's quite dangerous, because you've now broken the API contract with all existing callers, which may proceed on entirely erroneous assumptions and do something even more dangerous than crashing. You can introduce new APIs and deprecate the existing APIs, but that's potentially a monumental amount of work, with no guarantee of practical success (and has to be prioritized against everything else the library developers could be doing).
One of the issues with the original report is that it completely failed to engage with this problem. Even if you agree that libraries should never assert, the reality is that GMP already does assert and does not define error returns in some of the places that it asserts. One can argue that it never should have gotten into that position, but now that it is, it's very inobvious that changing GMP to return errors is correct.
Posted Mar 2, 2019 11:10 UTC (Sat)
by devkev (subscriber, #74096)
[Link] (3 responses)
Basically, the library adds a new function called libfoo_register_fatal_handler, which takes a function pointer that's called whenever an assertion fails. The default handler just calls abort.
This approach preserves existing legacy API and behaviour, while also allowing programs to opt-in to arbitrary error handling (including abort, noop, or setting a flag to know that the returned value will be garbage and inhibit future calls into the library). Changing shouldn't be overly hard, since if you call your new macro assert() then you can avoid needing to touch code all over the library.
Posted Mar 2, 2019 17:29 UTC (Sat)
by rra (subscriber, #99804)
[Link]
Posted Mar 3, 2019 6:20 UTC (Sun)
by songmaster (subscriber, #1748)
[Link] (1 responses)
Posted Mar 4, 2019 20:16 UTC (Mon)
by xtifr (guest, #143)
[Link]
Posted Mar 2, 2019 10:01 UTC (Sat)
by mfuzzey (subscriber, #57966)
[Link]
That way if, the application *wants* to assert rather than checking return values for internal error it can.
If the application doesn't set the handler or the handler does nothing the library should return "internal error"..
Posted Mar 8, 2019 2:19 UTC (Fri)
by john.carter (guest, #123615)
[Link] (1 responses)
You cannot make a library fool proof, fools are too ingenious.
You can only tell the fool that he has been foolish.
Returning error codes are for telling the caller that something they couldn't possibly know about, has occurred. eg. Some external process has deleted the file, so open cannot succeed, so return ENOENT.
Precondition asserts are for telling the caller that there is a bug in their program that can only be fixed by a new version of their program.
Of course any part of a process can kill a process, this is a fact of programming life.
Divide by zero, dereference null.... do you want the CPU to ignore those errors and just womble on doing stupider and stupider things.... sometimes working, sometimes doing rubbish, sometimes dying?
Attempting to detect and unwind every possible flavour of stupid on the part of the caller merely creates an untested and untestable spaghetti of error handling code.
I get infuriated by people who write such spaghetti... and then never even check the error code, and just cast it to void. Experience tells me whenever I see that pattern... I will find bugs.
Far far more opportunities for security flaws and leaks to hide in that stuff.
People get horribly confused between "something unexpected came in over the wire from an external system" vs "your program has a bug and will do wrong _every time_ until you get off your butt and fix it".
An assert should _only_ be used for second thing, and should _always_ die as there is no other sane and reliable way of recovering.
Posted Mar 22, 2019 0:40 UTC (Fri)
by mcortese (guest, #52099)
[Link]
Ideally I'd like the application to degrade in a localized and controlled way. For example, if the battery applet on my panel, which calculates the % of charge dividing the current charge by the max capacity, faces a divide-by-zero for whatever reason (say, a hardware failure of the battery control), it could simply show 0% or 100% or even a meaningless value, and yet this would be ways better than killing the whole panel[1].
So yes, sometimes wombling on doing stupider and stupider things is the desirable approach.
[1] In fact, this is not a made-up example but a real case with LXDE panel.
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
number sqrt(number in); // new: narrow contract (sqrt applicable only for non-negative arguments)
GMP and assert()
1. Rare, but perfectly possible conditions e.g. "file not found".
2. Violation of internal invariants e.g. "my example with hypot above".
"propagate error to caller policy" should not be used for 2, because it will cause all sorts of "this must not be possible here" issues. 2 must be checked with asserts.
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
Ideally the language should know about error return values and add an assert if the caller doesn't handle the error return value himself..
But I don't know of a language which allow this..
GMP and assert()
> But I don't know of a language which allow this..
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
fatal error: no such file: path/to/some/lfs/file/on/another/branch
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
Wol
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
GMP and assert()
error codes and asserts have different purposes.
error codes and asserts have different purposes.
