refcount_t meets the network stack
The patch in question converts the network stack to the new refcount_t type introduced for 4.11. This type is meant to take over reference-count duties from atomic_t adding, in the process, checks for overflows and underflows. A number of recent kernel exploits have taken advantage of reference-count errors, usually as a way to provoke a use-after-free vulnerability. By detecting those problems, the refcount_t type can close off a whole family of exploit techniques, hardening the kernel in a significant way.
Networking developer Eric Dumazet was quick to point out the cost of switching to refcount_t: what was once a simple atomic operation becomes an external function call with added checking logic, making the whole thing quite a bit more expensive. In the high-speed networking world, where the processing-time budget for a packet is measured in nanoseconds, this cost is more than unwelcome. And, it seems, there is a bit of wounded pride mixed in as well:
But, as Kees Cook pointed out in his reply, it may well be time to give up a little pride, and some processor time too:
Making the kernel more robust is a generally accepted goal, but that in
itself is not enough to get hardening patches accepted. In this case,
networking maintainer David Miller was quite
clear on what he thought of this patch: "the refcount_t facility
as-is is unacceptable for networking
". That leaves developers
wanting to harden reference-counting code throughout the kernel in a bit of
a difficult position.
As it happens, that position was made even harder by two things: nobody had actually quantified the cost of the new refcount_t primitives, and there are no benchmarks that can be used to measure the effect of the changes on the network stack. As a result, it is not really even possible to begin a conversation on what would have to be done to make this work acceptable to the networking developers.
With regard to the cost, Peter Zijlstra ran
some tests on various Intel processors. He concluded that the cost of
the new primitives was about 20 additional processor cycles in the
uncontended case. The contended case (where more than one thread is trying
to update the count at the same time) is far more expensive with or without
refcount_t, though, leading him to conclude that "reducing
contention is far more effective than removing straight line instruction
count
". Networking developers have said in the past that the processing budget
for a packet is about 200 cycles, so expending an additional 20 on a
reference-count operation (of which there may be several while processing a
single packet) is going to hurt.
The only way to properly quantify how much it hurts, though, is with a test
that exercises the entire networking stack under heavy load. It turns out
that this is not easy to do; Dumazet admitted that "there is no good test
simulating real-world workloads, which are mostly using TCP flows
".
That news didn't sit well with Cook, who
responded that "without a meaningful test, it's weird to reject a
change for performance reasons
". No such test has materialized,
though, so it is going to be hard to say much more about the impact of the
refcount_t changes than "that's going to hurt".
What might happen in this case is that the change to refcount_t could
be made optional by way of a configuration parameter. That is expressly
what the hardening developers wanted not to do: hardening code is
not effective if it isn't actually running in production kernels. But
providing such an option may be the only way to get reference-count
checking into the network stack. At that point, it will be up to
distributors to decide, as they configure their kernels, whether they think
20 cycles per operation is too high a cost to pay for a degree of immunity
from reference-count exploits.
| Index entries for this article | |
|---|---|
| Kernel | Reference counting |
| Security | Linux kernel |
Posted Mar 30, 2017 22:56 UTC (Thu)
by gerdesj (subscriber, #5446)
[Link]
I'm a sysadmin by trade and I suggest that something that offers additional security with a trade off in speed (unquantified at the moment) is precisely the sort of option that I should be setting myself. I know where I'll be putting systems and what they are doing and what threats they face.
Until this thing can actually be quantified in some way, it surely can't become a default.
Cheers
Posted Mar 31, 2017 7:14 UTC (Fri)
by nhippi (subscriber, #34640)
[Link] (4 responses)
I'n this particular case, I wonder why reference counting bugs couldn't be found with static analysis - and if making static analysis tools easier to use and part of natural kernel developer workflow would have better impact in kernel security than adding runtime complexity.
Posted Mar 31, 2017 12:38 UTC (Fri)
by mina86 (guest, #68442)
[Link] (3 responses)
Posted Mar 31, 2017 16:21 UTC (Fri)
by zlynx (guest, #2285)
[Link] (1 responses)
Then why are you using Linux instead of a micro kernel? The monolithic kernel design is all about performance.
In fact, that isn't going nearly far enough for security! You should be using nested virtual machines, and inside those running QEMU emulating an encrypted instruction set of your own making!
Now, how far down the rabbit hole of "security" do you want to go? At SOME point performance takes over and you have to say "No more security!"
Posted Apr 3, 2017 11:20 UTC (Mon)
by mina86 (guest, #68442)
[Link]
Because there’s no working distribution with micro kernel.
> The monolithic kernel design is all about performance.
It’s also easier to write.
> You should be using nested virtual machines
Yeah, I know. I just don’t have time to set up VMs for some of the more sensitive tasks I’m performing. Some day for sure…
> and inside those running QEMU emulating an encrypted instruction set of your own making!
I don’t understand what that mean or how would that increase security in practical way.
> Now, how far down the rabbit hole of "security" do you want to go?
Definitely much deeper than what seems to be the standard nowadays.
> At SOME point performance takes over and you have to say "No more security!"
Sure. We are far from that point though.
Posted Apr 10, 2017 9:32 UTC (Mon)
by oldtomas (guest, #72579)
[Link]
Posted Mar 31, 2017 8:26 UTC (Fri)
by zenaan (guest, #3778)
[Link]
Posted Mar 31, 2017 10:39 UTC (Fri)
by jenro (subscriber, #7024)
[Link]
There are lots of linux systems out there, who would not even notice the additional execution time per packet, because they don't handle that much network traffic. And many of this low volume network systems are of a kind that can not be - easyly - updated or where the owners don't know how or don't care to update: Smartphones, embedded or iot devices, SOHO servers including nas boxes, home routers, ... .
These systems would profit from any additional security - espacially when a whole class of exploits can be removed.
On the other hand there are systems, who must handle a really high volume of network traffic. Those who run this kind of systems obviously need a capable admin who can keep the systems current and apply any security patches and who is able to perform some additional setup tasks, to optimze the systems for maximum network speed.
So my suggestion would be: make refcount_t a compiletime option in the network stack and set the default to on. And encourage everybody to leave the default on, so that most systems are more secure by default.
Those who really need the last bit of performance and are willing to take the risk, would have to do additional steps for setup, maybe recompiling the kernel with a different config. Distributions could help by documenting the needed steps.
Posted Apr 1, 2017 8:36 UTC (Sat)
by zyzzyva (guest, #107472)
[Link] (1 responses)
That would be nearly the same performance as the unchecked version and the performance debate would basically go away.
Posted Apr 1, 2017 10:40 UTC (Sat)
by PaXTeam (guest, #24616)
[Link]
also i'd like to dispell a few myths that i keep seeing spread about this refcount_t API. first, it's not a refcount API at all, it's just another low level plumbing layer on top of another low level plumbing layer without making any higher level user's life easier. second, beyond its design mistake of replacing single instruction atomic operations with a bloated cmpxchg loop (that won't look much nicer on LL/SC archs either since they already have their own hand-optimized loops for atomic ops), the implementation also suffers from absolutely bogus 'security' checks. in particular, the checks against a 0 refcount value show a lack of understanding of what happens during a refcount underflow based exploit. hint: object reuse means the refcount field won't be 0 and thus the checks will never fire.
refcount_t meets the network stack
Jon
refcount_t meets the network stack
refcount_t meets the network stack
refcount_t meets the network stack
refcount_t meets the network stack
refcount_t meets the network stack
refcount_t meets the network stack
refcount_t meets the network stack
refcount_t meets the network stack
lock incl %0
jo <somewhere>
refcount_t meets the network stack
