LWN.net Logo

Re: statfs() / statvfs() syscall ballsup...

From:  Linus Torvalds <torvalds-AT-osdl.org>
To:  Joel Becker <Joel.Becker-AT-oracle.com>
Subject:  Re: statfs() / statvfs() syscall ballsup...
Date:  Fri, 10 Oct 2003 10:40:40 -0700 (PDT)
Cc:  Chris Friesen <cfriesen-AT-nortelnetworks.com>, Jamie Lokier <jamie-AT-shareable.org>, Trond Myklebust <trond.myklebust-AT-fys.uio.no>, Ulrich Drepper <drepper-AT-redhat.com>, Linux Kernel <linux-kernel-AT-vger.kernel.org>


On Fri, 10 Oct 2003, Joel Becker wrote:
> 
> 	msync() forces write(), like fsync().  It doesn't force read().

Actually, the kernel has a "readahead(fd, offset, size)" system call that
will start asynchronous read-ahead on any mapping. After that, just
touching the page will obviously map in and synchronize the result.

I don't think anybody uses it, and the interface may be broken, but it was
literally 20 lines of code, and I had a trivial test program that
populated the cache for a directory structure really quickly using it.

In general, it would be really nice to have more oracle people discussing
what their particular pet horror is, and what they'd really like to do.

I know you're more used to just doing your own thing and working with
vendors, but even just people getting used to do the unofficial "this is
what we do, and it sucks because xxx" would make people more aware of what 
you wan tto do, and maybe it would suggest novel ways of doing things.

I suspect most of the things would get shot down as being impractical, but
there have always been a lot of discussion about more direct control of
the page cache for programs that really want it, and I'm more than willing
to discuss things (obviously 2.7.x material, but still.. A lot of it is
trivial and could be back-ported to 2.6.x if people start using it).

For example, things we can do, but don't, partly because of interface 
issues and because there is no point in doing it if people wouldn't use 
it:

 - moving a page back and forth between user space. It's _trivial_ to do, 
   with a fallback on copying if the page happens to be busy (ie we can 
   often just replace the existing page cache page, but if somebody else
   has it mapped, we'd have to copy the contents instead)

   We can't do this for "regular" read and write, because the resulting 
   copy-on-write sitution makes it less than desireable in most cases, but 
   if the user space specifically says "you can throw these pages away
   after moving them to the page cache", that avoids a lot of horror.

   The "remap_file_pages()" thing kind of does this on the read side (ie 
   it says "map in this page cache entry into my virtual address space"), 
   but we don't have the reverse aka "take this page in the virtual 
   address space and map it into the page cache".

   Interfaces like these would also allow things like zero-copy file
   copies with smaller page cache footprints - at the expense of 
   invalidating the cache for the source file as a result of the copy. 
   Which is why it can't be a _regular_ read - but it's one of those 
   things where if the user knows what he wants..

 - dirty mapping control (ie controlling partial page dirty state, and 
   also _delaying_ writeout if it needs to be ordered). Possibly by having 
   a separate backing store (ie a mmap that says "read from this file, but
   write back to that other file") to avoid the nasty memory management 
   problems.

A lot of these are really easy to do, but the usage and the interfaces are 
non-obvious.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo-AT-vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


(Log in to post comments)

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds