Kernel development

Brief items

Kernel release status

The current 2.6 development kernel remains 2.6.25-rc8; there have been no kernel releases over the last week. The patch rate into the mainline repository has slowed considerably, but the current regression list suggests that the 2.6.25 release is not imminent quite yet.

Comments (1 posted)

Kernel development news

Quotes of the week

The big item (in more ways than one) for this release is the addition of s390 support. As it is not actually provided in the tarball, you will need to use git to fetch it. You will also need a mainframe.

-- Avi Kivity finally brings virtualization to s390

err = -ENOBUFS; /* PS. You suck! */

-- Rusty Russell invents enhanced error codes

Comments (2 posted)

A Linux Driver Project status report

Greg Kroah-Hartman has sent out a lengthy report on the state of the Linux Driver Project. "The main problem is a lack of projects. It turns out that there really isn't much hardware that Linux doesn't already support. Almost all new hardware produced is coming with a Linux driver already written by the company, or by the community with help from the company."

Full Story (comments: 53)

Memory allocation failures and scary warnings

By Jonathan Corbet
April 7, 2008

People who put their Linux systems under a certain amount of memory stress - and who look at their logfiles - may notice an occasional message indicating that a "page allocation failure" has occurred, followed by a scary backtrace. These people may also notice that, despite the apocalyptic appearance of this message, the world often fails to end. In fact, the system tends to carry on just fine. For this reason, Dave Jones, who probably gets ten emails for every backtrace generated on a Fedora system, has suggested that these messages are simply noise which should be removed. Whether that should really happen is not entirely clear, though; understanding why requires a bit of background.

In general, the kernel's memory allocator does not like to fail. So, when kernel code requests memory, the memory management code will work hard to satisfy the request. If this work involves pushing other pages out to swap or removing data from the page cache, so be it. A big exception happens, though, when an atomic allocation (using the GFP_ATOMIC flag) is requested. Code requesting atomic allocations is generally not in a position where it can wait around for a lot of memory housecleaning work; in particular, such code cannot sleep. So if the memory manager is unable to satisfy an atomic allocation with the memory it has in hand, it has no choice except to fail the request.

Such failures are quite rare, especially when single pages are requested. The kernel works to keep some spare pages around at all times, so the memory stress must be severe before a single-page allocation will fail. Multi-page allocations are harder, though; the kernel's memory management code tends to fragment pages, making groups of physically-contiguous pages hard to find. In particular, if the system is under pressure to the point that there is not much free memory available at all, the chances of successfully allocating two (or more) contiguous pages drops considerably.

Multi-page allocations are not often used in the kernel; they are avoided whenever possible. There are situations where they are necessary, though. One example is network drivers which (1) support the transmission and reception of packets too large to fit into a single page, and which (2) drive hardware which cannot perform scatter/gather I/O on a single packet. In this situation, the DMA buffers used for packets must be larger than one page, and they must be physically contiguous. This is a situation which will become less pressing over time; scatter/gather capability in the hardware is increasingly common, and drivers are being rewritten to make use of this capability. With sufficiently smart hardware, the need for multi-page allocations goes down considerably.

But all of that skirts around the main point, which is that kernel code is supposed to handle allocation failures properly. There is never any guarantee that memory will be available, so kernel code must be written defensively. Allocation failures must be handled without losing any more capability than is strictly necessary. If one assumes that kernel code is written correctly, there should be no need to issue warnings on allocation failures. Things should just continue to work, perhaps without users noticing at all.

And, in fact, things often do just work. But the discussion resulting from Dave's suggestion makes it clear that few developers are confident that all kernel code does the right thing in the face of memory allocation problems. In cases where an allocation failure is not dealt with correctly, the system may go down in random places, leaving few clues as to what really happened. In that kind of situation, the allocation failure warning may be the only useful information which survives the crash. For this reason, some people want to see the warnings left in place.

As it happens, the memory allocator supports a special bit (__GFP_NOWARN) which causes the warning not to be emitted if a specific allocation fails. So it has been suggested that the allocations made from code which is known to handle failures properly have __GFP_NOWARN set. That would kill the warnings in code known to do the right thing while leaving it for all other callers, presumably limiting the warnings to places where there might truly be a problem. Jeff Garzik strongly opposed this idea, though, saying that it clutters up the code and "punishes good behavior."

The other reason given for keeping the warnings in place is to make it clear when a system is running under persistent memory pressure. Such systems will not be performing optimally; often there are changes which can be made to relieve the pressure and help the system to run more smoothly. So it has been suggested that the warning could be reduced in frequency and made less scary. Nick Piggin suggests:

So I think that the messages should stay, and they should print out some header to say that it is only a warning and if not happening too often then it is not a problem, and if it is continually happening then please try X or Y or post a message to lkml...

An alternative idea would be to keep some sort of counter somewhere which could be queried by curious system administrators.

Of course, the real solution is to ensure that all kernel code is robust in the face of allocation failures. This can be hard to do, since the error recovery paths in any code are not often exercised or tested. Fortunately, the fault injection framework can help in this situation. Kernel developers can use this framework to simulate allocation failures in specific regions of code, then watch to see what happens. Your editor's impression, though, is that relatively few developers are using this tool. So confidence in the kernel's handling of allocation failures may remain low, and the desire to keep the warning around may remain high.

Comments (2 posted)

Improving syncookies

April 9, 2008

This article was contributed by Patrick McManus

Back in 1997 TCP SYN flood attacks were all the rage among script kiddies. A SYN flood is a denial of service attack that uses up server resources by initiating, but not completing, a connection. Attacks via this method still remain a problem today though they are now more likely to be launched by sophisticated botnets rather than an individual. A first line defense against SYN floods is the syncookie. The syncookie was not designed for Linux specifically but found its way into kernel 2.1.44 via a patch from Andi Kleen.

This long-time feature generated some recent discussion when a patch was submitted adding syncookie support to IPv6. The patch has now been queued for acceptance but in discussion along the way the community also began to tackle some longstanding limitations of syncookies and reaffirmed how relevant the feature continues to be.

To fully describe syncookies some background on how TCP uses a three way handshake to establish a connection is in order. The first packet of any TCP session received by the server is known as the SYN packet because it carries the synchronize control flag. The SYN flag indicates that its sender wishes to open a new connection. That flag is only used during the opening sequence. The server responds with a packet also containing the SYN flag because the connection needs to be opened in both directions. This second packet also carries the ACK flag and is known as the SYN-ACK. It serves to both open the connection from the server to the client and to acknowledge receipt of the opening packet from the other host. Finally, the client sends a bare ACK packet to the server to acknowledge receipt of server-to-client SYN-ACK and the connection is then fully established.

During a SYN flood a server receives the first packet of the three-way TCP handshake and responds with a SYN-ACK but no further data is ever received from the initiating client. When the SYN-ACK is generated most servers will also create an entry in the SYN queue. This queue is the waiting area for half-open connections awaiting handshake completion. The attacker intentionally orphans those entries and instead generates more SYN packets which in turn take up more entries in the queue. The server needs to wait for a long timeout before giving up and recovering the connection resources. During this time the attacker can flood it with many more half-open connections. Eventually the server runs out of resources and cannot accept any new connections without dropping some, perhaps legitimate, connection from the queue. Simple solutions such as placing a quota on the number of partially open connections per peer or using dynamically adjusted packet filters do not work because the SYN packets are easy to forge with fake source addresses.

A syncookie allows the server to defer using up any resources until the third packet in the three-way handshake has been received. At that time the peer's address has been mildly authenticated because the final packet in the handshake contains a reference to the sequence number that was sent by the server in the second packet. With this assurance, packet filters and resource quotas keyed to the peer's address will again be useful defenses against resource attacks.

The basic mechanism of the syncookie works by carefully manipulating the initial sequence number value of the connection instead of choosing it at random. Upon receiving a SYN the server carefully encodes the vital information that would have been stored as state in the SYN queue. This encoded information is cryptographically hashed with a secret key to form the sequence number of the SYN-ACK and sent to the client. The third packet of a legitimate handshake, which is the ACK from the client back to the server, contains this sequence number (plus one) in its acknowledgment number field. In this way all the information necessary to fully open the connection is presented back to the server without having to maintain state while the handshake is being completed.

The major downside to syncookies is that they only have space to encode the most basic of TCP handshake options. At the time of initial syncookie deployment this was not a large problem because the only option prominently in use at the time was the Maximum Segment Size (MSS) option. This option is provided to help the peer avoid unnecessary fragmentation by sending packets that the other end of the connection knows a priori are too large to cross its network. This is exactly the kind of information that is normally stored as state in the SYN queue. The syncookie designers knew that this option was important to performance and found 3 bits for it in the encoded syncookie. These bits are used to approximate the real value of the option to one of 8 common values.

In the intervening years new options have come into prominence and these are not syncookie compatible. The most important of these are the window scaling and Selective Acknowledgment (SACK) options. These features respectively allow the TCP congestion control window to grow beyond 64KB and be more efficient in the case of minor packet losses from those large windows. Without using these features it is impossible to get good transfer rates on networks with large bandwidth or large latency. Many household broadband links require at least the window scaling option to fully utilize the network connection. Due to this limitation, and the modest computation overhead of the cryptographic hash, the Linux stack only resorts to syncookie based connections when the number of half-open connection exceeds a high watermark controlled by the net.ipv4.tcp_max_syn_backlog sysctl. These connections are less featureful than normal connections but they are only resorted to when the queue would otherwise require active pruning.

It turns out that the cookie mechanism is only implemented for IPv4. Recently, Glenn Griffin posted patches that add IPv6 support for syncookies. Andi Kleen, author of the original syncookie patch, wondered if the mechanism should be continued at all much less added to IPv6:

Syncookies are discouraged these days. They disable too many valuable TCP features (window scaling, SACK) and even without them the kernel is usually strong enough to defend against syn floods and systems have much more memory than they used to be. So I don't think it makes much sense to add more code to it, sorry.

Andi's argument was three pronged. His first point was about the reduced abilities of cookie initiated connections as already described in this article. Over time the value of these options has increased and therefore the cost of using syncookies has increased too. His second point was that Linux no longer uses all of the memory necessary for a full connection until the new connection is fully open. Instead it uses a "minisock" for that period. The minisock is a 96 byte struct tcp_request_sock structure holding the minimum state necessary to get the connection fully opened. The fully established struct tcp_sock is 1616 bytes. Both structure size measurements refer to a 64-bit kernel. Finally, Andi points out that the queue management routines for an overloaded SYN queue are more sophisticated now than the dumb head drop algorithm that was in place when syncookies were first deployed. The suggestion was that in aggregate these advances might make Linux robust enough without syncookies so that they could therefore be removed all together.

Instead of engaging in a theoretical discussion some readers set up and ran their own experiments. One of the best parts of the Linux community is the tendency to put real data behind their arguments. While there is often disagreement over the realism of the measured scenarios, the data points always help us better understand the dynamics of kernel code.

Willy Tarreau: My tests on an AMD LX800 with max_syn_backlog at 63000 on an HTTP reverse proxy consisted in injecting 250 hits/s of legitimate traffic with 8000 SYN/s of noise.[..] Without SYN cookies, the average response time was about 1.5 second and unstable (due to retransmits), and the CPU was set to 60%. With SYN cookies enabled, the response time dropped to 12-15ms only, but CPU usage jumped to 70%. The difference appears at a higher legitimate traffic rate.

Ross Vandegrift: Under no SYN flood, the server handles 750 HTTP requests per second, measured via httping in flood mode. With a default tcp_max_syn_backlog of 1024, I can trivially prevent any inbound client connections with 2 threads of syn flood. Enabling tcp_syncookies brings the connection handling back up to 725 fetches per second.

This data compellingly supports the continued value of the syncookie and that position seems to have won the day. The IPv6 syncookie patches are now queued within the network 2.6.26 development tree.

However, the biggest news is probably that this discussion brought renewed energy to the problem of lost handshake options. Florian Westphal and Glenn Griffin have recently presented a solution to the most damaging aspect of that problem too.

Their solution is to leverage the echoed TCP timestamp option in a way similar to the way classic syncookies leverage the echoing of the SYN-ACK sequence number in the subsequent ACK. The timestamp option was introduced with RFC 1323 and is widely deployed on modern Linux, Windows, and FreeBSD (including OS X) systems. Its main purpose is to be able to increase the frequency of round trip time measurements in the presence of large congestion control windows.

Using the timestamp to preserve the window scale and SACK option values requires modifying the timestamp of the SYN-ACK packet to include the state necessary to support them. During a normal handshake the client will echo the modified timestamp value of the SYN-ACK packet back to the server as part of the timestamp option on the third part of the handshake and thus propagate the SACK and window scale information without keeping any state on the server.

In order to make room in the timestamp for this new information the least significant 9 bits of the timestamp are shaved off. The encoded representation of the window scale and SACK options are then transferred back and forth at the minor cost of reduced granularity of TCP timestamps during the handshake exchange. Timestamps lose their least significant 512 jiffies with this approach.

Below are two different TCP handshakes completed with syncookies and the timestamp patch. Note that the lowest bits of the SYN-ACK timestamp are the same in each handshake even at different points in time because each handshake uses the same SACK and window scaling options. As a result the timestamp values in each SYN-ACK are different but the lower nine bits share the same 0x166 value.

13:51:04.582464 IP 127.0.0.1.57985 > 127.0.0.1.4050: S 1061746051:1061746051(0)
           win 32792 <mss 16396,sackOK,timestamp 0xfffea013 0,nop,wscale 6>
13:51:04.582478 IP 127.0.0.1.4050 > 127.0.0.1.57985: S 2800702917:2800702917(0)
           ack 1061746052 win 32768 <mss 16396,sackOK,timestamp 0xfffe9f66 0xfffea013,nop,wscale 6>
13:51:04.582480 IP 127.0.0.1.57985 > 127.0.0.1.4050: . 
           ack 1 win 513 <nop,nop,timestamp 0xfffea013 0xfffe9466>

13:59:19.047306 IP 127.0.0.1.45979 > 127.0.0.1.4050: S 218483035:218483035(0) 
           win 32792 <mss 16396,sackOK,timestamp 0x0001bed4 0,nop,wscale 6>
13:59:19.047320 IP 127.0.0.1.4050 > 127.0.0.1.45979: S 1141094138:1141094138(0)
           ack 218483036 win 32768 <mss 16396,sackOK,timestamp 0x0001bd66 0x0001bed4,nop,wscale 6>
13:59:19.047322 IP 127.0.0.1.45979 > 127.0.0.1.4050: . 
           ack 1 win 513 <nop,nop,timestamp 0x0001bed4 0x0001bd66>

While there is no guarantee that the timestamp option will be supported by every TCP peer, timestamps are widely deployed on the most common operating systems. Additionally, because timestamps, window scaling, and selective acknowledgments are all features related to high latency and bandwidth networks it would be unlikely to find an implementation that supported only a subset of these options.

One shortcoming of the scheme is that it is not general enough to be future-proof as new handshake based options may continue to be deployed. At this time the MSS, SACK, window scaling, and timestamp options are the only handshake options seen with any regularity other than the NOP option which is just used for packet alignment. However, the whole point of an extensible option scheme is to leave room for future improvements. The IANA registry that records option values was last updated in February 2007 to reserve option code 27 for use with Experimental RFC 4782 "Quick Start for TCP and IP". Only time will tell if that particular option will be the next challenge to the syncookie scheme or if something else will rise first.

The timestamp patch has only been posted very recently, and there has been little discussion of it beyond the developers who worked directly on it. It is not clear whether or not it will be accepted right away into the mainline, but it certainly seems to address a well known core problem with the syncookie at a minor cost.

With the updates for IPv6 and modern TCP option schemes syncookies appear primed to keep providing sweet relief in their somewhat esoteric networking security niche. Perhaps they will keep chugging away for another 10 years without having to be re-baked.

Comments (8 posted)

vringfd()

By Jonathan Corbet
April 7, 2008

One of the core features of the (now stalled) kevent subsystem was a circular buffer intended for efficient movement of data between the kernel and user space. Kevent may have run out of steam, but the ring buffer idea is back via a different path. Rusty Russell is now proposing a new system call (called vringfd()) which turns some of the virtio work into a new kernel-to-user ring buffer interface. The submitted patch is breathtaking in its lack of documentation on this new system call, especially considering that its author is quite good with that sort of writing. Your editor has taken this omission as a personal challenge and, as a result, has set about reverse engineering the (somewhat complex) vringfd() interface.

A user-space process which wishes to set up a vring for communication with the kernel must create a slightly complicated data structure first. One starts by deciding how many entries the ring should have; this number must be a power of two which fits into an unsigned, 16-bit value. Given this number (we'll call it RING_SIZE), the data structure looks like this:

    struct messy_vring_thing {
	struct vring_desc descriptors[RING_SIZE];
	struct vring_avail available;
	char padding[up-to-next-page-boundary];
	struct vring_used used[RING_SIZE];
    };

The page alignment for the used array is important - that array might be mapped separately into kernel space. The array must fit into a single page, which puts a practical limit of 256 entries for RING_SIZE on systems with 4096-byte pages. If this API goes forward, chances are good that a way will be found to raise this limit.

Individual descriptors in the ring are described with this structure:

    struct vring_desc
    {
	__u64 addr;	/* Address of the buffer */
	__u32 len; 	/* Length of the buffer */
	__u16 flags;	/* some flags */
	__u16 next;	/* Next buffer in the chain */
    };

For a simple buffer, the application would simply point addr at the beginning and set len to the appropriate value. If the buffer is to be written to by the kernel, the application should also set VRING_DESC_F_WRITE in the flags field.

Things can get more complicated than that, though, in that the vringfd() interface supports multipart scatter/gather buffers. To set up such a buffer, user space would use one vring_desc entry for each segment of the buffer. For all but the final segment, the VRING_DESC_F_NEXT flag (saying "use the next descriptor too") should be set, and next should be the index of the next descriptor. When the kernel grabs a buffer, it will follow the chain and use all segments found until the final one (which lacks the VRING_DESC_F_NEXT flag) is encountered.

Before the kernel will use buffers set up by the application, though, user space must indicate that the buffer is ready. That is done through the vring_avail structure:

    struct vring_avail
    {
	__u16 flags;
	__u16 idx;
	__u16 ring[RING_SIZE];
    };

The ring array holds indexes into the descriptors array. The idx field should always be the index of the last valid entry in ring. When a new buffer is ready for transfer to or from the kernel, the application will store the index of the first descriptor into ring[idx+1], then increment idx. When the ring is first established, the kernel remembers the position of idx, so the first buffer should be added here after the vringfd() system call is made.

The kernel will consume buffers from the available ring as needed. Once the requested operation has been performed on the buffer and the kernel is done with it, the buffer will show up in the used area, which is structured this way:

    struct vring_used_elem
    {
	__u32 id;
	__u32 len;
    };

    struct vring_used
    {
	__u16 flags;
	__u16 idx;
	struct vring_used_elem ring[RING_SIZE];
    };

In the vring_used structure, idx is the index of the next entry in ring which may be written by the kernel; it will be incremented after the ring is updated. When a buffer is placed in the used ring, the id field will be the index of the descriptor, and len will be the actual length of the data transferred.

Note that the flags fields in the vring_avail and vring_used structures appear to be unused.

Once the application has this whole data structure set up, it can establish the ring buffer with the kernel with the new system call:

    long vringfd(void *addr, unsigned int ring_size, u16 *last_used);

Here, addr is the base address of the data structure described above, ring_size is the number of descriptors in the ring, and last_used is a 16-bit unsigned integer indicating which entry in the used ring was last consumed by the application. Failure to keep last_used current will not slow things down, but it will keep poll() from working properly.

The return value will be a file descriptor associated with the ring.

Creating the vring is only part of the job, though. The next step is to connect it with a kernel subsystem for the transfer of data. Rusty's patch includes vring support in the tun virtual network driver; to use that support, an application makes a special ioctl() call to provide the vring file descriptor to the tun driver. Any other subsystem will need a similar mechanism to support vring.

If the application is using the ring to transfer data into the kernel, it must (1) set up one or more descriptors for full data buffers in the available ring, then (2) make a write() call to the vring file descriptor. The buffer and length passed to write() are ignored; all that matters is that a write was done to that file descriptor. When write() returns the operation will have been set in motion, but it cannot be considered to be complete until the ring descriptors show up in the used ring.

For data transfers from the kernel to user space, the application simply puts buffers into the available ring, then waits until they show up in the used ring. A poll() on the vring file descriptor will block until buffers are available. The kernel determines whether unconsumed buffers exist in used by comparing the vring_used->idx index against the application-supplied last_used value. It's worth noting that, depending on how the relevant kernel subsystem works, buffers may not actually make it into the used ring until the poll() call is made.

On the kernel side, a developer wanting to add vring support to a subsystem will start by creating a set of vring_ops:

    struct vring_ops
    {
	void (*destroy)(void *);
	int (*pull)(void *);
	int (*push)(void *);
    };

All of these functions take a private pointer given when the subsystem attaches to the vring (to be described shortly). The pull() callback is invoked when the application calls poll(); if there is any descriptor processing which must be done with user space accessible, this is the place to do it. If pull() adds any buffers to the used ring, it should return the number of buffers; it can also return a negative error code. push() is called from a write() call indicating that there are buffers ready to be transferred into the kernel; it returns zero or a negative error code. The destroy() callback is called when the vring file descriptor is closed. All of these callbacks are optional.

Attaching to a vring is done with:

    struct vring_info *vring_attach(int fd, const struct vring_ops *ops,
				    void *data, bool atomic_use);

For this call, fd is a file descriptor corresponding to a vring, ops is the operations structure described above, data is a private data pointer which is passed into the vring_ops callbacks, and atomic_use is nonzero if the kernel needs to be able to add buffers to the used ring in atomic context. The return value is a pointer to an internal vring data structure or an ERR_PTR() value if something goes wrong.

To obtain a buffer from the available ring, a call is made to:

    int vring_get_buffer(struct vring_info *vr,
		         struct iovec *in_iov,
		     	 unsigned int *num_in, unsigned long *in_len,
		     	 struct iovec *out_iov,
		     	 unsigned int *num_out, unsigned long *out_len);

This function will fill in an array of iovec structures corresponding to the next available buffer. If the kernel expects to write to the buffer, it should set in_iov to the iovec array, num_in pointing to the length of in_iov, and in_len pointing to a location to store the total length of the buffer (or NULL if that information is not useful). For transfers into the kernel, out_iov, num_out, and out_len should be set similarly. Note that the addresses stored in the iovec arrays are user-space addresses; vring_get_buffer() does not validate them, so the caller must do so.

It is possible to set pass both in_iov and out_iov; in this case, one of the two will be set, depending on whether the next buffer in the available ring has the VRING_DESC_F_WRITE flag set. In most cases, though, only one of the two sets of parameters will have non-NULL values. The apparent intent of the API is that, if bidirectional transfers between user space and the kernel are needed, two separate vrings should be used.

The return value from vring_get_buffer will be one of (1) a positive descriptor index, (2) zero, indicating that no buffers are available, or (3) a negative error code.

The descriptor index should be saved the the final step, which is indicating that the kernel is done with a specific buffer:

    void vring_used_buffer(struct vring_info *vr, int id, u32 len);
    void vring_used_buffer_atomic(struct vring_info *vr, int id, u32 len);

Either one of these functions indicates that the buffer indicated by id should be put into the used ring; len is the amount of data actually transferred. If sleeping is not possible, vring_used_buffer_atomic() should be used - but the vring must have been attached with the atomic_use flag set.

There does not appear to be a way for a subsystem to detach from a vring; it must, instead, wait for the application to close the associated file descriptor.

This interface is in an early stage, and the code has a number of limitations and FIXME comments. So things seem likely to evolve before vringfd() is seriously considered for merging into the mainline kernel. The idea of a ring buffer for this kind of communication seems to come around on a regular basis, though, so it would seem that there is a demand for this kind of API.

Comments (5 posted)

Patches and updates

Kernel trees

Oliver Pinter linux v2.6.22.21-op1 ?

Daniel Walker fully bisectable real time series ?

Architecture-specific

Huang, Ying AES: x86_64 asm implementation optimization ?

Markus Metzger x86, ptrace: PEBS support ?

Johannes Berg introduce HAVE_EFFICIENT_UNALIGNED_ACCESS Kconfig symbol ?

Glauber Costa integrate dma_ops ?

Mathieu Desnoyers Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 ?

Core kernel code

Peter Zijlstra fast_gup for shared futexes ?

Mike Travis cpumask: reduce stack pressure from local/passed cpumask variables v3 ?

Rusty Russell vringfd syscall ?

Paul E. McKenney Add call_rcu_sched() ?

David Howells [PATCH] PRINTK: Decouple input locking from output locking and move render outside lock ?

Development tools

Vegard Nossum kmemcheck v7 ?

Szabolcs Szakacsits Linux POSIX file system test suite ?

Device drivers

Stefan Richter What's in linux1394-2.6.git? ?

Dmitry Baryshkov Clocklib: generic clocks framework ?

Rafael J. Wysocki PM: New suspend and hibernation callbacks ?

Markus Becker Update: [RFC][PATCH] mrv8k: Driver for "Marvell 88w8335 [Libertas]" ?

Herton Ronaldo Krzesinski Add Realtek 8187B support ?

Darius I2C driver for IMX ?

Documentation

Randy Dunlap documentation: build source files in Documentation sub-dir ?

Filesystems and block I/O

Miklos Szeredi vfs: add helpers to check r/o bind mounts ?

Mark Fasheh Ocfs2 updates for 2.6.26-rc1 ?

joern@logfs.org LogFS take five ?

Akira Fujita ext4 online defrag (ver 0.8) ?

Jan Kara Ordered mode rewrite patch ?

Memory management

Nick Piggin SLQB: YASA ?

Balbir Singh Add an owner to the mm_struct (v6) ?

Johannes Weiner Generic show_mem() ?

Andrea Arcangeli [PATCH 0 of 8] mmu notifiers #v10 ?

Andrea Arcangeli mmu notifier #v11 ?

Andrea Arcangeli [PATCH 0 of 9] mmu notifier #v12 ?

Christoph Lameter [patch 00/10] [RFC] EMM Notifier V3 ?

Nitin Gupta compcache: compressed caching v2 ?

Networking

YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= [IPV6] MROUTE: Support multicast routing. ?

YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= [IPV6] SIT: ISATAP Updates. ?

Rusty Russell net: add destructor for skb data. ?

Juliusz Chroboczek Stochastic Fair Blue queue discipline ?

Security-related

Tetsuo Handa TOMOYO Linux 1.6.0 released ?

Virtualization and containers

Satoshi UCHIDA Yet another I/O bandwidth controlling subsystem for CGroups based on CFQ ?

Matt Helsley Container Freezer: Reuse Suspend Freezer ?

Avi Kivity [PATCH 00/35] KVM updates for the 2.6.26 merge window (part II) ?

Avi Kivity kvm-65 release ?

Jeremy Fitzhardinge [PATCH 0 of 6] [RFC] another attempt at making hotplug memory and xen play together ?

sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org Clone PTY namespaces ?

sukadev@us.ibm.com Clone PTS namespace ?

Paul Menage Control Groups Roadmap ideas ?

Benchmarks and bugs

Rafael J. Wysocki 2.6.25-rc8-git2: Reported regressions from 2.6.24 ?

Rafael J. Wysocki 2.6.25-rc8-git7: Reported regressions from 2.6.24 ?

Arjan van de Ven Kernel Warning/Oops report for the week of April 4th, 2008 ?

Page editor: Jonathan Corbet
Next page: Distributions>>