The 2.6.24 merge window is open and, as of this writing, some 5,600 patches
have found their way into the mainline. As usual, the list of changes is
extensive. Some of the highlights among user-visible changes are:
New drivers have been added for
Toshiba TCM825x cameras (as found in the Nokia N800),
Conexant cx23415 MPEG encoders (in framebuffer mode),
Dual-DVB-T tuners,
DiBcom DiB0070 tuners,
Microtune MT2131 tuners,
Samsung S5H1409 demodulators,
Conexant CX23885/CX23887 PCIe bridge devices,
Samsung LTV350QV LCD panel backlights,
Kingsun KS-959 and "Dazzle" IrDA USB dongles,
ADMtek ADM8211-based wireless network adapters,
Marvell Libertas 8385 CF wireless adapters,
Intel 82598-based 10GbE network cards,
Intel PRO/Wireless 3945ABG/BG and Link AGN adapters (finally),
IP1000 Gigabit Ethernet cards,
ICH9 on-board Ethernet adapters,
Tehuti Networks 10G Ethernet adapters,
Sonics Silicon Backplane busses,
several Ralink wireless adapters,
"EMAC" built-in PowerPC ethernet controllers,
Sun Neptune Ethernet adapters,
Winchiphead CH341 USB-RS232 converters,
Atmel AT32AP7000 USB device controllers,
Blackfin bf548 ATAPI controllers,
Atmel AVR32 parallel ATA controllers,
National Semiconductor NS87415 parallel ATA controllers,
Olympus MAUSB-10 and Fujifilm DPC-R1 flash card readers (which have
the nice feature of allowing direct flash access without an
intervening translation layer),
TI DaVinci I2C controllers,
Analog Devices ADT7470 temperature monitoring chips,
IBM PowerExecutive power/temperature sensors,
TI AR7 CPMAC Ethernet controllers,
Siemens SX1 phones,
Dallas Semiconductor DS1374 realtime clock chips,
Atmel AT73C213 external sound devices,
Cirrus Logic CS4270 codec devices,
Gallant SC-6000 Audio Excel DSPs, and
Atmel AT32AP and AT91 on-chip synchronous serial controllers.
The Broadcom BCM43xx driver has been replaced by a new version which
uses the mac80211 layer. Actually, there's two drivers: "b43" for
newer adapters, and "b43legacy" for older 802.11b and 802.11g
devices.
The "dgrs" Digi RightSwitch driver has been removed from the kernel.
This product, evidently, was never actually sold, so there should not
be a whole lot of users inconvenienced by this change.
The kernel now has basic support for SDIO peripherals. There is also
now driver support for MMC/SD cards accessed via SPI controllers.
The CFS group scheduling
code has been merged. As of this writing, though, the feature cannot
actually be turned on because the control groups code has not yet been
merged. There is also a per-UID fair scheduling option which does work now.
There is now support for RPC-based RDMA and the ability to mount NFS
filesystems using RDMA.
The traffic shaper, which limits bandwidth usage on network links, has
been marked obsolete and scheduled for removal in 2.6.25. The much
more flexible qdisc subsystem should be used instead.
Allocation of UDP port numbers is now randomized.
The netconsole code can now support multiple logging targets.
Support for network namespaces has been added, enabling the
virtualization of network-related resources in containers. Also
merged is a virtual Ethernet driver which can be used to create
network tunnels into (and out of) containers.
The Authenticated chunks
protocol for the stream control transmission protocol (SCTP) is supported.
A new "stateless NAT" implementation performs IPv4 network address
translation in a much more resource-efficient manner.
A new hidraw device provides access to a stream of
unprocessed input device events for applications which have special
needs in this area.
The per-device write
throttling patches have been merged; these patches should help the
system keep heavy traffic on one block device from starving other
devices. The floating
proportions patch, needed to support per-device throttling, has
also gone in.
There is a new sysctl flag for the out-of-memory killer
(oom_kill_allocating_task). If this flag is set, the OOM
killer will simply kill the process whose allocation brings about the
out-of-memory situation instead of scanning through the system looking
for better targets.
Disk quota messages can now be delivered via a netlink socket. This
should make it easier for graphical environments to inform the user
when disk quota problems are encountered.
The new F_DUPFD_CLOEXEC command causes fcntl() to
duplicate a file descriptor and set the close-on-exec flag from the
beginning.
Block reservations have been added to the ext2 filesystem.
The Linux security module interface is now a non-module interface: the
ability to load security modules on the fly has been removed.
Important changes visible to kernel developers include:
As expected, the i386/x86_64
architecture merger has happened. The result is a single
architecture, called "x86," which can be built for 32-bit and 64-bit
processors.
The Video4Linux layer has some new internal support for composite
devices involving more than one driver (many V4L2 devices involve, at
a minimum, separate drivers for the controller and the sensor).
Also in Video4Linux: the video-buf layer has been replaced with a more
generic implementation which works with a wider range of devices
(including USB devices and those which do not support scatter/gather
DMA).
The NAPI interface used in network drivers has been reworked to better
support devices with multiple transmit queues.
The networking layer has a new function for printing MAC addresses:
char *print_mac(char *buf, const u8 *addr);
The buf buffer should be declared with
DECLARE_MAC_BUF(); the output is suitable for formatting in
printk() with "%s".
The NETIF_F_LLTX (lockless transmit) flag for network devices
has been deprecated and should not be used in new code.
The functions ktime_sub_us() and ktime_sub_ns() have
been added; they subtract the given number of microseconds or
nanoseconds from a ktime_t value.
The hard_header() method has been removed from struct
net_device; it has been replaced by a per-protocol
header_ops structure pointer.
The debugfs filesystem has some new functions
(debugfs_create_x8(), debugfs_create_x16(),
debugfs_create_x32()) which make it easy to export files
containing hexadecimal numbers.
Various small sysfs-related API changes have been made. The
name field has been removed from the kobject
structure. The prototypes of the user-event callbacks have been
changed. Many of the subsystem-related calls have been removed -
subsystems never really did much of anything anyway;
get_bus() and put_bus() are also gone.
A new value DMA_MASK_NONE can be stored in the
device structure dma_mask field to indicate that the
device is incapable of performing DMA.
The VFS has a couple of new address space operations
(write_begin() and write_end()) aimed at fixing some
deadlock scenarios; see this article
for more information.
The scatterlist chaining
patches have been merged and many parts of the kernel have been
updated to use this feature.
The CFLAGS= and CPPFLAGS= options now work with the
kernel build system in the expected way: they add flags to be passed
to the C compiler and preprocessor, respectively.
The prototype for slab constructor callbacks has changed to:
The unused flags argument has been removed and the order of
the other two arguments has been reversed to match other slab
functions.
The DECLARE_MUTEX_LOCKED() macro has been removed.
The long-deprecated SA_* interrupt flags have been removed in
favor of the IRQF_* equivalents.
A number of block layer utilities have seen prototype changes. The
most evident change, perhaps, is bio_endio() and the
associated bio_end_io_t callback:
void bio_endio(struct bio *bio, int error);
typedef void (bio_end_io_t) (struct bio *, int);
These functions now always completes the entire BIO, so the size
argument has been removed.
As of this writing, the 2.6.24 merge window can be expected to remain open
for up to another week. So expect more changes to go into the mainline
before this development cycle goes into the stabilization phase.
Posted Oct 18, 2007 9:45 UTC (Thu) by mingo (subscriber, #31122)
[Link]
The CFS group scheduling code has been merged. There is also a mechanism for per-UID group scheduling. As of this writing, though, the feature cannot actually be turned on because the control groups code has not yet been merged.
Small correction: it can already be turned on via the CONFIG_FAIR_GROUP_SCHED=y config option - this is a feature that has been added recently.
With this feature enabled the kernel will schedule based on UIDs too: if one user runs 500 tasks and the other user runs only 2 tasks, the two users will still share the CPU(s) fairly, with both getting 50% of CPU time. (I have also tested this with fork bombs: if a user is running a fork bomb then other users are not affected at all [CPU scheduling wise], and the admin can log in and disable the user as well.)
The scheduling weight of each UID defaults to 100%, and it can be dynamically reconfigured via sysfs: /sys/kernel/uid/*/cpu_share - a value of 1024 means 100% weight. So by echoing 2048 into that file a user will get twice as much CPU time as a user who has 1024 cpu_share value.
An additional mechanism is that uevents can be used to configure UIDs as they get created. None of this needs the control groups / containers APIs.
Group scheduling
Posted Oct 18, 2007 11:36 UTC (Thu) by mingo (subscriber, #31122)
[Link]
/sys/kernel/uid/*/cpu_share
Correcting myself: the right path is /sys/kernel/uids/*/cpu_share
Group scheduling
Posted Oct 18, 2007 11:56 UTC (Thu) by paravoid (subscriber, #32869)
[Link]
While this sounds interesting, I wonder what it's implications going to be on systems that
have not designed with that in mind.
e.g. a random process run by a user getting 50% while N processes of apache take the other
50%...
Group scheduling
Posted Oct 18, 2007 15:33 UTC (Thu) by mingo (subscriber, #31122)
[Link]
yes - anything that changes previous behavior can be "bad" for someone who expects exactly the same behavior as before.
that's why there is a config option for it that can be turned on or off. (and that's why the exact weight is configurable - often the benefits heavily outweigh the changed behavior and it's better to just adjust the defaults a bit - by for example giving the httpd user a higher weight.)
Group scheduling
Posted Oct 18, 2007 12:51 UTC (Thu) by corbet (editor, #1)
[Link]
It's the per-UID scheduling which can be turned on now, right? When I added that sentence I obviously didn't take a proper look at how the paragraph reads as a whole...
Group scheduling
Posted Oct 18, 2007 15:30 UTC (Thu) by mingo (subscriber, #31122)
[Link]
It's the per-UID scheduling which can be turned on now, right?
yeah.
When I added that sentence I obviously didn't take a proper look at how the paragraph reads as a whole...
Indeed - and you describe it all in more detail in the larger article about group scheduling. But i only saw that after writing this comment :-/
Merged for 2.6.24
Posted Oct 18, 2007 12:27 UTC (Thu) by cathectic (subscriber, #40543)
[Link]
Second bullet point - s/b32legacy/b43legacy/
Biggest set of changes since 2.6's initial release?
Posted Jan 27, 2008 17:12 UTC (Sun) by AnswerGuy (guest, #1256)
[Link]
Do you all think it would be fair to say that 2.6.24 represents the biggest set of changes
(at least user visible changes) since 2.6 was released?
For the last few years I've been trapped in "RHEL Hell" (happily working, but at a company
which only supports RHEL, SLES and relatively old versions of those). Most of my day-to-day
work has been focused in user space and on 2.4.21-*bleach!!!* kernels (with some
2.6.9-*almost_tolerable*). So I haven't kept up on the newer releases much.
However, I have glanced over each new release and this this one seems to be pretty extensive.
JimD