In brief
That is normally the right thing to do; better to put the CPU time to good use than to have the processor go idle while processes want to run. But there are, it seems, situations where system administrators would rather not hand out excess CPU time in that way. If, for example, the processes belong to a customer who is paying for a certain amount of processing time, giving away more could be bad business. To keep this from happening, Bharata B Rao has created the CFS hard limits patch set. Hard limits are managed using control groups; they allow the administrator to set an absolute limit on the amount of CPU time the control group as a whole is able to use over a given period of real time. Billing users who want their limit raised is, of course, a user-space policy issue, so it's not part of this patch.
Discard again. The "discard" operation, which informs a block storage device that specific blocks are no longer in use, should help a wide variety of storage technologies - including solid-state devices and "thin provisioned" arrays - to perform better. But discard, itself, has some performance issues; see the trouble with discard for details.
Christoph Hellwig is trying to improve discard performance with a new set of patches, some of which originally come from Matthew Wilcox. These changes allow discard requests to cover much larger sections of the storage device; previously they had been limited by the maximum request size for the device. When combined with the XFS-specific XFS_IOC_TRIM ioctl() command, this change allows user-space to issue bulk discard operations for all of the free portions of a filesystem partition at an opportune time. The patches also add better control over whether any specific discard request should be seen as a queue barrier and whether it should be performed as a blocking operation.
Upcoming network driver API change. Not content with having reworked the network driver API once (by moving operations into their own structure), Stephen Hemminger now has a new patch set which changes the API implemented by all drivers. The function involved is ndo_start_xmit(), which is used by the networking layer to pass a packet to the driver for transmission. This function should really only return one of two values: NETDEV_TX_OK (meaning that the packet has been accepted and queued for transmission) or NETDEV_TX_BUSY (the packet was not accepted because the queue was full or some similar problem came up). Drivers using the deprecated LLTX mode can also return NETDE_TX_LOCKED to indicate that the transmit lock was already taken.
The problem is that the return type for ndo_start_xmit() was defined as int; some driver writers thought that meant they could return arbitrary error codes to the networking layer. With Stephen's patch, the return type becomes netdev_tx_t, an enum containing only the defined return codes. That should catch any driver writers who try to return the wrong thing - but at the cost of changing a lot of drivers.
Checkpoint/restore wiki. There is a new wiki dedicated to the collection of information about the rapidly-developing checkpoint/restore functionality. It's a little bare at the moment, but, one assumes, it will soon be filled with information about this feature.
The actual checkpoint/restore task remains an exercise in complexity. As an example, consider one of the most recently-posted pieces: checkpoint and restore for security credentials. It requires a number of hooks into LSM modules to obtain the current security state, serialize it, and to restore it at some future time. It can all probably be made to work, but long-term maintenance could prove to be painful.
The BFS scheduler. Con Kolivas, who worked on desktop interactivity issues in the past before abruptly leaving the kernel development community in 2007, has posted a new scheduler called BFS. Con Says:
(See the original LWN posting
for the associated comment thread.)
