A slow path to a fast fix
In April 2014 Al Viro, as part of the project of converting much of the kernel's virtual filesystem code over to the iov_iter abstraction, committed a patch reworking how the pipe code performs I/O. In the old version, pipes would work with a struct iovec array; the replacement used struct iov_iter instead. It looked like a straightforward code refactoring and cleanup with, it was thought, no user-visible changes at all.
That turned out not to be true; the old code contained a little mistake that wasn't present in the new version. If an attempt to perform a read or write operation atomically (without blocking) failed, the code would retry in a non-atomic mode. But, if this failure occurred partway through an operation with multiple iovecs to handle, the code would forget about the ones it had already done; the subsequent retry would run off the end of the iovec array. There is no indication that Al knew this at the time he fixed the bug, but others eventually figured it out.
But it took a while. In March of 2015, nearly one year later, a Red Hat
bugzilla entry was filed for this issue, indicating that: "A
local, unprivileged user could use this flaw to crash the system or,
potentially, escalate their privileges on the system.
" Red Hat's
developers quickly verified that the RHEL 5, 6, and 7
distributions were all vulnerable to the problem. They were a little less
quick to fix it, though; the first alerts
didn't come out until the beginning of June, nearly three months later. A
flood of advisories came out in the following months; see the LWN vulnerability entry for a list.
Then things went quiet for a while, but that doesn't mean that nothing was happening. It would appear that the developers of the KingRoot tool happened upon this issue and were able to write an exploit for it. KingRoot is an Android app intended to let users gain root on their devices. It is not something that most users would see as malicious; they install it intentionally it so that they can gain control over their own devices. But if KingRoot can exploit this issue, then other Android apps can do that as well.
Once Google became aware of the issue, a patch was put together and an advisory was sent out to the world. Concern about security issues led Google to start sending out monthly security updates a while back, but this one was seen as being even more serious than that: an extraordinary, mid-month update was created for it. Owners of Nexus devices should be seeing it show up, though your editor's phone has not yet received the update.
With luck, this one has been caught before it was used to cause serious harm, though it is difficult to know that for sure. Meanwhile, owners of devices that do not receive regular security updates — that would be most of them — will be out of luck and vulnerable for the foreseeable future. In any case, it is unfortunate that a problem that was fixed two years ago, and understood to be a security issue one year ago, had to be fixed in an extraordinary update so much later on.
There is no knowing why Google didn't identify this vulnerability as being important when it was first disclosed. Perhaps it didn't seem like something that could be exploited. One can only wonder how many other known but unfixed issues have slipped under the radar in the same way. That number cannot be zero.
There are a number of conclusions that can be drawn from this event, most of which are hardly new or original. A much wider variety of bugs than one might think can be exploited; this is especially true in kernel space. There is value in improving code; designing and implementing a better I/O-tracking abstraction fixed a severe bug that the developers didn't even know was present. There is value in running newer code; the fixes went into 3.16, so anything running a kernel at least that recent is unaffected. Unfortunately, Android devices are not known for running recent kernels.
But, most to the point, our current security process is failing us. If we
were able to fix problems as quickly as we like to think, it would still be
insufficient; it's increasingly clear that patching things up after
somebody discloses them is not enough. But, sometimes, problems seemingly
slip through the cracks and are not fixed quickly, even when they are
known. Exploit writers are less inclined to drag their feet, and will thus
always be ahead in this game. We need solutions that are not so
exclusively dependent on patching up vulnerabilities after the fact.
Index entries for this article | |
---|---|
Security | Linux kernel |
Posted Mar 24, 2016 6:13 UTC (Thu)
by viro (subscriber, #7872)
[Link]
iov_iter_copy_from_user() predates the whole series; it goes back to 2007 and used to live in mm/filemap.c - it's a replacement of filemap_copy_from_user() that doesn't mutilate iovecs; a part of the series by Nick Piggin that had introduced iov_iter in the first place. The whole pile (fault-in, try to use kmap_atomic, etc.) had originated there,
iov_iter_copy_from_user() did the same song and dance (try to copy under kmap_atomic, etc.), but it didn't modify iovec *and* didn't advance the iov_iter. The latter was actually a bad idea - that's what copy_page_from_iter() had done differently, and that's what pretty much every caller wanted. However, pipe.c variant *did* modify iovecs, and what's more, it didn't do a self-contained fallback. So it had to be called twice. In a sense, that bug was a result of bad calling conventions...
Combination of iov_iter_copy_from_user() and advancing the iterator was fairly obvious from the look at its callers. What's more, it *did* the fallback, so it looked like it should've simplified the hell out of pipe_writev() - the thing there was obviously a homegrown analogue of that thing, with bloody inconvenient calling conventions.
So there it went. As it turned out, the homegrown analogue (with very obvious intended behaviour) wasn't correct. I really should have done the massage towards the open-coded copy_page_from_iter() - it would've caught the bogosity there and then, *and* it would have avoided compounding another bug in there with an embarrassing bug of my own. In case of failed copy_from_user() in the very beginning EFAULT had been lost - write() ended up returning 0 instead. Eric Biggers had caught that (and a very similar lost error nearby from all way back in 2006) last October...
As an aside, early enough it had become pretty clear that "take an array of iovecs, consume some data, adjust ->iov_{base,len} so that the next time it would point past the data consumed" was a source of recurring bugs waiting to happen. So systematic switch to iterators leaving the iovecs never modified was likely to reduce the likelyhood of that class of bugs. Some got noticed when fixing (e.g. CVE-2014-0069 got caught while doing similar conversion in cifs_iovec_write(); the fix got reordered in front of the queue, leaving the pure conversion to copy_page_from_iter() until the merge in April). Some didn't - bogus code with obvious intent got replaced with something that did what everyone assumed that bogus code to have been doing all along, without noticing a bug concealed from everyone (including the original author) by the convolutions in old variant...
And no, it wasn't a coverup - cifs bugs noticed at the same time got reported as such; I wouldn't have hesitated to do the same to pipe one had I noticed it. Code in cifs_iovec_write() was simpler, so the bogosity there got spotted...
Posted Mar 24, 2016 9:28 UTC (Thu)
by ortalo (guest, #4654)
[Link]
Some people sometimes qualify successive security protections with varying names, like excessive security, redundant controls, double risks coverage, etc. I've seen this kind of discourse in many contexts (technical or non technical), but in the first place such people are simply wrong (and should probably stop taking decisions or responsibilites in this area).
Note that the accurate design of a security architecture implementing defense in depth is also entirely orthogonal to the problem of selecting good security mechanisms in the first place. (Those arguing for single protection mechanism also frequently also select inadequate ones to add injury to insult.)
[1] Note in the kernel this is debatable though btw. I wonder if self-contradiction qualifies as a redundant control. :-)
Posted Mar 24, 2016 10:04 UTC (Thu)
by malor (guest, #2973)
[Link] (4 responses)
The policy of hiding security fixes is actively hurting end-users. Not all fixes get ported to all the various Linux forks. This will. not. change. Will. Not.
At least, by telling people that patches have a security context when you know they do, you increase the chance of that patch being brought to other kernel trees. Nobody is asking for security analysis if it doesn't already exist, just... share what you know. Hiding security patches is bullshit, and it's not working. There's a huge underground ecosystem that thrives on this, on noticing the things that kernel devs try to hide.
So stop trying to hide things. It would have a wildly disproportionate positive impact for the amount of mental energy expended.
Posted Mar 24, 2016 11:27 UTC (Thu)
by khim (subscriber, #9252)
[Link] (2 responses)
Indeed. And that's fine. If people like to live with a known devil (and known vulnerabilities) then it's their choice. Why? As this example very succinctly demonstrates this wouldn't change anything: either you follow the ToT or you are broken. Heck, you are broken even if you do follow ToT, what chance is there for you to remain protected if you decide to stay in the past? The whole security model is broken. If we care about security then we should redesign our toys (most likely we'll need microkernels and hardware support to make them fast), if we don't (and we don't: we complain loudly when there's new vulnerability found but if someone offers to sacrifice even just 5% of efficiency... no, we, as a society, don't want to pay for the security... at all), then we don't. It's as simple as that.
Posted Mar 24, 2016 13:05 UTC (Thu)
by ortalo (guest, #4654)
[Link] (1 responses)
Due to that, executives are having everybody else take risks and they are also taking too much risk for their organizations (either companies or administrations).
(Sidenote for the curious executive: No, restarting a 2nd Crypto War debate does not qualify as such at all... Neither multiplying inefficient security controls around your own neighbourhood only. Until you actually try to learn you will not learn anything new...)
Posted Mar 28, 2016 0:48 UTC (Mon)
by dps (guest, #5725)
[Link]
This is the sort of logic that RIM used: "The black berry server is insufficiently secure for use in a DMZ network. We suggest you put in your internal network.". When I that explained this said "The black berry server is insecure. We suggest you put where the damage is maximised.", and that no server anywhere should have security issues beyond those implied by the services provided, it became an easy decision not to spend a large amount of money on that box.
There is also the issue that all their employees need security and not just the chief executive, and just because you are the CEO does not make security optional.
Posted Mar 28, 2016 7:07 UTC (Mon)
by Seegras (guest, #20463)
[Link]
Yes. But why should anyone run some kind of archaic kernel in the first place? A year after the fixed one was released?
Posted Mar 25, 2016 13:23 UTC (Fri)
by BenHutchings (subscriber, #37955)
[Link] (1 responses)
Posted Mar 26, 2016 18:43 UTC (Sat)
by thestinger (guest, #91827)
[Link]
https://android-review.googlesource.com/#/c/210720/
I'm sure they're missing a lot more than this though. Maybe one day I'll go through the stable fixes for 3.10 and submit the relevant ones to them. It would be really nice if they pulled in all the stable kernel fixes. I can understand that they'd want to do it themselves, but they really aren't doing a good job.
A while ago, I noticed that they somehow had a vulnerability fix for their own Android extension (VMA naming) in most of their 3.10 kernels that was missing for 3.4. I had run into the vulnerability by writing buggy userspace code triggering the crash. They paid me a bug bounty for reporting it (was not expecting that at all)... so it's hard to complain.
A slow path to a fast fix
actually - in September 2002, series from Andrew Morton. In 2006 Jens Axboe had done something similar in fs/pipe.c. That's when the bug had been introduced.
Multiple lines of defense are not superfluous
Redundant security measures are what allows you to stop losing the eternal chase between vulnerabilities and corrections by tolerating the presence of some of the inevitable vulnerabilities without compromising security altogether.
Of course, there are useless defense in depth architectures, but they are like useless security mechanisms: useless.
A slow path to a fast fix
A slow path to a fast fix
Not all fixes get ported to all the various Linux forks. This will. not. change. Will. Not.
While it's not relevant in this specific case, it would be very good if bugs that are known to have security impacts are labeled as such.
A slow path to a fast fix
The problem is we need to change the decisions of the people currently holding the design authority on our toys (both from the hardware or software point of view).
First, I guess we should point out that the problem is *not* that these top level executives or governement decision-makers do not want to pay for security.
It was/is that they only want to pay for *their* protection (using corporate or public money btw) and now they select the risks they want to cover much too narrow-mindedly or with much too little adequate information.
The risk selection is catastrophic at the decision-taking level for the moment in computer security (and certainly in security too), because it has degraded continuously for... maybe 2 decades.
By the way, it seems to me some US companies are among the first to have started to realize that: that the risks of their customers are also necessarily theirs to some extent. That they do not live in isolation. But apparently, as private companies, they cannot integrate the full spectrum of the risk management (because of, you know, the "business" reasons and the learning disabilities money making seems to create).
Anyway they should not be allowed to police these risk management decisions alone themselves - because they are only businesses after all.
But then, we are left with governments to actually do that. And they will annoyingly be a part of the problem until they really decide to acquire recent and accurate knowledge of the issues involved.
A slow path to a fast fix
A slow path to a fast fix
Unfortunately the fix by Seth Jennings for RHEL, later applied to stable branches, was still incorrect, leading to CVE-2016-0774. I hope AOSP picks up the second fix as well.
A slow path to a fast fix
A slow path to a fast fix
https://android-review.googlesource.com/#/c/210730/