refcount_t meets the network stack
The patch in question converts the network stack to the new refcount_t type introduced for 4.11. This type is meant to take over reference-count duties from atomic_t adding, in the process, checks for overflows and underflows. A number of recent kernel exploits have taken advantage of reference-count errors, usually as a way to provoke a use-after-free vulnerability. By detecting those problems, the refcount_t type can close off a whole family of exploit techniques, hardening the kernel in a significant way.
Networking developer Eric Dumazet was quick to point out the cost of switching to refcount_t: what was once a simple atomic operation becomes an external function call with added checking logic, making the whole thing quite a bit more expensive. In the high-speed networking world, where the processing-time budget for a packet is measured in nanoseconds, this cost is more than unwelcome. And, it seems, there is a bit of wounded pride mixed in as well:
But, as Kees Cook pointed out in his reply, it may well be time to give up a little pride, and some processor time too:
Making the kernel more robust is a generally accepted goal, but that in
itself is not enough to get hardening patches accepted. In this case,
networking maintainer David Miller was quite
clear on what he thought of this patch: "the refcount_t facility
as-is is unacceptable for networking
". That leaves developers
wanting to harden reference-counting code throughout the kernel in a bit of
a difficult position.
As it happens, that position was made even harder by two things: nobody had actually quantified the cost of the new refcount_t primitives, and there are no benchmarks that can be used to measure the effect of the changes on the network stack. As a result, it is not really even possible to begin a conversation on what would have to be done to make this work acceptable to the networking developers.
With regard to the cost, Peter Zijlstra ran
some tests on various Intel processors. He concluded that the cost of
the new primitives was about 20 additional processor cycles in the
uncontended case. The contended case (where more than one thread is trying
to update the count at the same time) is far more expensive with or without
refcount_t, though, leading him to conclude that "reducing
contention is far more effective than removing straight line instruction
count
". Networking developers have said in the past that the processing budget
for a packet is about 200 cycles, so expending an additional 20 on a
reference-count operation (of which there may be several while processing a
single packet) is going to hurt.
The only way to properly quantify how much it hurts, though, is with a test
that exercises the entire networking stack under heavy load. It turns out
that this is not easy to do; Dumazet admitted that "there is no good test
simulating real-world workloads, which are mostly using TCP flows
".
That news didn't sit well with Cook, who
responded that "without a meaningful test, it's weird to reject a
change for performance reasons
". No such test has materialized,
though, so it is going to be hard to say much more about the impact of the
refcount_t changes than "that's going to hurt".
What might happen in this case is that the change to refcount_t could
be made optional by way of a configuration parameter. That is expressly
what the hardening developers wanted not to do: hardening code is
not effective if it isn't actually running in production kernels. But
providing such an option may be the only way to get reference-count
checking into the network stack. At that point, it will be up to
distributors to decide, as they configure their kernels, whether they think
20 cycles per operation is too high a cost to pay for a degree of immunity
from reference-count exploits.
| Index entries for this article | |
|---|---|
| Kernel | Reference counting |
| Security | Linux kernel |
