|
|
Subscribe / Log in / New account

In brief

By Jonathan Corbet
May 27, 2009
Union directories. While a number of developers are working on the full union mount problem, Miklos Szeredi has taken a simpler approach: union directories. Only top-level directory unification is provided, and changes can only be made to the top-level filesystem. That eliminates the need for a lot of complex code doing directory copy-up, whiteouts, and such, but also reduces the functionality significantly.

Optimizing writeback timers: on a normal Linux system, the pdflush process wakes up every five seconds to force dirty page cache pages back to their backing store on disk. This wakeup happens whether or not there is anything needing to be written back. Unnecessary wakeups are increasingly unwelcome, especially on systems where power consumption matters, so it would be nice to let pdflush sleep when there is nothing for it to do.

Artem Bityutskiy has put together a patch set to do just that. It changes the filesystem API to make it easier for the core VFS to know when a specific filesystem has dirty data. That information is then used to decide whether pdflush needs to be roused from its slumber. The idea seems good, but there's one little problem: this work conflicts with the per-BDI flusher threads patches by Jens Axboe. Jens's patches get rid of the pdflush timer and make a lot of other changes, so these two projects do not currently play well together. So Artem is headed back to the drawing board to base his work on top of Jens's patches instead of the mainline.

recvmmsg(). Arnaldo Carvalho de Melo has proposed a new system call for the socket API:

    struct mmsghdr {
	struct msghdr	msg_hdr;
	unsigned	msg_len;
    };

    ssize_t recvmmsg(int socket, struct mmsghdr *mmsg, int vlen, int flags);

The difference between this system call and recvmsg() is that it is able to accept multiple messages with a single call. That, in turn, reduces system call overhead in high-bandwidth network applications. The comments in the patch suggest that sendmmsg() is in the plans, but no implementation has been posted yet.

There was a suggestion that this functionality could be obtained by extending recvmsg() with a new message flag, rather than adding a new system call. But, as David Miller pointed out, that won't work. The kernel currently ignores unrecognized flags; that will make it impossible for user space to determine whether a specific kernel supports multiple-message receives or not. So the new system call is probably how this feature will be added.

Index entries for this article
Kernelrecvmmsg()


to post comments

Re: In brief

Posted May 28, 2009 4:11 UTC (Thu) by jengelh (guest, #33263) [Link]

Noting it again: Union directories (or even arbitrary dentries) do not provide a separate namespace, i.e. no way to access any of the lower inodes even in read-only mode, so standalone filesystems like unionfs/aufs won't go away anytime soon.

10 Gbps pipes

Posted May 28, 2009 6:44 UTC (Thu) by ncm (guest, #165) [Link]

sendmmsg and recvmmsg will help us keep those 10 Gbps pipes filled. There's nothing worse than paying for a 10G pipe and finding you can only succeed in pushing 2 Gpbs through it.

recvmmsg() = great idea

Posted May 28, 2009 16:06 UTC (Thu) by tstover (guest, #56283) [Link]

I do several things with datagrams both with unix domain sockets and UDP. In those cases the programs are always working with recvmsg() directly anyway [as opposed to possibly read()], and would be quick to gain from this idea.

In brief

Posted Jun 1, 2009 16:53 UTC (Mon) by butlerm (subscriber, #13312) [Link] (2 responses)

sendmmsg and recvmmsg are outstanding extensions to the the kernel API.
There are numerous UDP style protocols that would benefit significantly from
this.

Has anyone got recvmmsg to work

Posted Mar 8, 2010 17:47 UTC (Mon) by 4utomat (guest, #61249) [Link] (1 responses)

Has anyone got recvmmsg to work?
On 2.6.33, the example above fails with errno=14 (Bad Address)

Has anyone got recvmmsg to work

Posted Mar 21, 2012 23:29 UTC (Wed) by gutschke (subscriber, #27910) [Link]

Looking at the kernel sources, it appears that the final version of this system call includes a timeout parameter. If you don't pass this parameter, chances are that it gets set to a random value pointing to invalid memory. That probably explains why you get an EFAULT no matter what you pass in the other parameters.

If you don't want to specify a timeout, just pass NULL as an additional parameter. That should fix your program.


Copyright © 2009, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds