Re: [PATCH 0/3] Volatile Ranges (v7) & Lots of words
[Posted November 2, 2012 by mkerrisk]
| From: |
| NeilBrown <neilb-AT-suse.de> |
| To: |
| John Stultz <john.stultz-AT-linaro.org> |
| Subject: |
| Re: [PATCH 0/3] Volatile Ranges (v7) & Lots of words |
| Date: |
| Tue, 2 Oct 2012 17:39:28 +1000 |
| Message-ID: |
| <20121002173928.2062004e@notabene.brown> |
| Cc: |
| LKML <linux-kernel-AT-vger.kernel.org>,
Andrew Morton <akpm-AT-linux-foundation.org>,
Android Kernel Team <kernel-team-AT-android.com>,
Robert Love <rlove-AT-google.com>, Mel Gorman <mel-AT-csn.ul.ie>,
Hugh Dickins <hughd-AT-google.com>,
Dave Hansen <dave-AT-linux.vnet.ibm.com>,
Rik van Riel <riel-AT-redhat.com>,
Dmitry Adamushko <dmitry.adamushko-AT-gmail.com>,
Dave Chinner <david-AT-fromorbit.com>,
Andrea Righi <andrea-AT-betterlinux.com>,
"Aneesh Kumar K.V" <aneesh.kumar-AT-linux.vnet.ibm.com>,
Mike Hommey <mh-AT-glandium.org>, Taras Glek <tglek-AT-mozilla.com>,
Jan Kara <jack-AT-suse.cz>, KOSAKI Motohiro <kosaki.motohiro-AT-gmail.com>,
Michel Lespinasse <walken-AT-google.com>,
Minchan Kim <minchan-AT-kernel.org>,
"linux-mm-AT-kvack.org" <linux-mm-AT-kvack.org> |
| Archive-link: |
| Article, Thread
|
On Fri, 28 Sep 2012 23:16:30 -0400 John Stultz <john.stultz@linaro.org> wrote:
>
> After Kernel Summit and Plumbers, I wanted to consider all the various
> side-discussions and try to summarize my current thoughts here along
> with sending out my current implementation for review.
>
> Also: I'm going on four weeks of paternity leave in the very near
> (but non-deterministic) future. So while I hope I still have time
> for some discussion, I may have to deal with fussier complaints
> then yours. :) In any case, you'll have more time to chew on
> the idea and come up with amazing suggestions. :)
Hi John,
I wonder if you are trying to please everyone and risking pleasing no-one?
Well, maybe not quite that extreme, but you can't please all the people all
the time.
For example, allowing sub-page volatile region seems to be above and beyond
the call of duty. You cannot mmap sub-pages, so why should they be volatile?
Similarly the suggestion of using madvise - while tempting - is probably a
minority interest and can probably be managed with library code. I'm glad
you haven't pursued it.
I think discarding whole ranges at a time is very sensible, and so merging
adjacent ranges is best avoided. If you require page-aligned ranges this
becomes trivial - is that right?
I wonder if the oldest page/oldest range issue can be defined way by
requiring apps the touch the first page in a range when they touch the range.
Then the age of a range is the age of the first page. Non-initial pages
could even be kept off the free list .... though that might confuse NUMA
page reclaim if a range had pages from different nodes.
Application to non-tmpfs files seems very unclear and so probably best
avoided.
If I understand you correctly, then you have suggested both that a volatile
range would be a "lazy hole punch" and a "don't let this get written to disk
yet" flag. It cannot really be both. The former sounds like fallocate,
the latter like fadvise.
I think the later sounds more like the general purpose of volatile ranges,
but I also suspect that some journalling filesystems might be uncomfortable
providing a guarantee like that. So I would suggest firmly stating that it
is a tmpfs-only feature. If someone wants something vaguely similar for
other filesystems, let them implement it separately.
The SIGBUS interface could have some merit if it really reduces overhead. I
worry about app bugs that could result from the non-deterministic
behaviour. A range could get unmapped while it is in use and testing for
the case of "get a SIGBUS half way though accessing something" would not
be straight forward (SIGBUS on first step of access should be easy).
I guess that is up to the app writer, but I have never liked anything about
the signal interface and encouraging further use doesn't feel wise.
That's my 2c worth for now. Keep up the good work,
NeilBrown
(
Log in to post comments)