By Jonathan Corbet
September 2, 2009
CFS hard limits. The Linux "completely fair scheduler" works by
dividing the available CPU time between the processes contending for
it. In many situations, though, processes running on the system will not
actually use their full fair share; they may spend enough time waiting for
I/O, for example, that they simply cannot run enough to use all of the time
they are entitled to. In such situations, CFS will give the left-over time
to more CPU-intensive processes that can make good use of it, even if those
processes have exceeded their allocation.
That is normally the right thing to do; better to put the CPU time to good
use than to have the processor go idle while processes want to run. But
there are, it seems, situations where system administrators would rather
not hand out excess CPU time in that way. If, for example, the processes
belong to a customer who is paying for a certain amount of processing time,
giving away more could be bad business. To keep this from happening,
Bharata B Rao has created the CFS
hard limits patch set. Hard limits are managed using control groups;
they allow the administrator to set an absolute limit on the amount of CPU
time the control group as a whole is able to use over a given period of
real time. Billing users who want their limit raised is, of course, a
user-space policy issue, so it's not part of this patch.
Discard again. The "discard" operation, which informs a block
storage device that specific blocks are no longer in use, should help a
wide variety of storage technologies - including solid-state devices and
"thin provisioned" arrays - to perform better. But discard, itself, has
some performance issues; see the
trouble with discard for details.
Christoph Hellwig is trying to improve discard performance with a new set of patches, some of
which originally come from Matthew Wilcox. These changes allow discard
requests to cover much larger sections of the storage device; previously
they had been limited by the maximum request size for the device. When
combined with the XFS-specific XFS_IOC_TRIM ioctl()
command, this change allows user-space to issue bulk discard operations for
all of the free portions of a filesystem partition at an opportune time.
The patches also add better control over whether any specific discard
request should be seen as a queue barrier and whether it should be
performed as a blocking operation.
Upcoming network driver API change. Not content with having
reworked the network driver API once (by moving operations into their own
structure), Stephen Hemminger now has a new patch set which changes
the API implemented by all drivers. The function involved is
ndo_start_xmit(), which is used by the networking layer to pass a
packet to the driver for transmission. This function should really only
return one of two values: NETDEV_TX_OK (meaning that the packet
has been accepted and queued for transmission) or NETDEV_TX_BUSY
(the packet was not accepted because the queue was full or some similar
problem came up). Drivers using the deprecated LLTX mode can also return
NETDE_TX_LOCKED to indicate that the transmit lock was already
taken.
The problem is that the return type for ndo_start_xmit() was
defined as int; some driver writers thought that meant they could
return arbitrary error codes to the networking layer. With Stephen's
patch, the return type becomes netdev_tx_t, an enum
containing only the defined return codes. That should catch any driver
writers who try to return the wrong thing - but at the cost of changing a
lot of drivers.
Checkpoint/restore wiki. There is a new wiki
dedicated to the collection of information about the rapidly-developing
checkpoint/restore functionality. It's a little bare at the moment, but,
one assumes, it will soon be filled with information about this feature.
The actual checkpoint/restore task remains an exercise in complexity. As
an example, consider one of the most recently-posted pieces: checkpoint and restore for security
credentials. It requires a number of hooks into LSM modules to obtain
the current security state, serialize it, and to restore it at some future
time. It can all probably be made to work, but long-term maintenance could
prove to be painful.
The BFS scheduler. Con Kolivas, who worked on desktop interactivity
issues in the past before abruptly leaving the kernel
development community in 2007, has posted a new
scheduler called BFS. Con Says:
It was designed to be forward looking
only, make the most of lower spec machines, and not scale to massive
hardware. ie it is a desktop orientated scheduler, with extremely low
latencies for excellent interactivity by design rather than 'calculated',
with rigid fairness, nice priority distribution and extreme scalability
within normal load levels.
(See the original LWN posting
for the associated comment thread.)
(
Log in to post comments)