|| ||Avi Kivity <avi-AT-argo.co.il>|
|| ||Badari Pulavarty <pbadari-AT-us.ibm.com>|
|| ||Re: [RFC][PATCH] New iovec support & VFS changes|
|| ||Tue, 20 Dec 2005 20:00:01 +0200|
|| ||Al Viro <viro-AT-ftp.linux.org.uk>, hch-AT-lst.de, akpm-AT-osdl.org,
davem-AT-redhat.com, Ulrich Drepper <drepper-AT-redhat.com>,
Linus Torvalds <torvalds-AT-osdl.org>,
Badari Pulavarty wrote:
>I was trying to add support for preadv()/pwritev() for threaded
>databases. Currently the patch is in -mm tree.
>This needs a new set of system calls. Ulrich Drepper pointed out
>that, instead of adding a system call for the limited functionality
>it provides, why not we add new iovec interface as follows (offset-per-
>segment) which provides greater functionality & flexibility.
>+ void __user *iov_base;
>+ __kernel_size_t iov_len;
>+ __kernel_loff_t iov_off; /* NEW */
>In order to support this, we need to change all the file_operations
>(readv/writev) and its helper functions to take this new structure.
>I took a stab at doing it and I want feedback on whether this is
>acceptable. All the patch does - is to make kernel use new structure,
>but the existing syscalls like readv()/writev() still deals with
>original one to keep the compatibility. (pipes and sockets need
>changing too - which I have not addressed yet).
>Is this the right approach ?
You can io_submit() a list of IO_CMD_PREAD[V]s and immediately
io_getevents() them. In addition to specifying different file offsets
you can mix reads and writes, mix file descriptors, and reap nonblocking
events quickly (by specifying a timeout of zero).
Sure, it's two syscalls instead of one, but it's much more flexibles,
and databases should be using aio anyway. Oh, and no kernel changes
needed, apart from merging vectored aio.
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
to post comments)