LWN.net Logo

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

From:  Linus Torvalds <torvalds-AT-linux-foundation.org>
To:  Andrea Arcangeli <andrea-AT-suse.de>
Subject:  Re: [00/41] Large Blocksize Support V7 (adds memmap support)
Date:  Tue, 18 Sep 2007 12:44:45 -0700 (PDT)
Message-ID:  <alpine.LFD.0.999.0709181239320.16478@woody.linux-foundation.org>
Cc:  Nick Piggin <nickpiggin-AT-yahoo.com.au>, Christoph Lameter <clameter-AT-sgi.com>, Mel Gorman <mel-AT-skynet.ie>, linux-fsdevel-AT-vger.kernel.org, linux-kernel-AT-vger.kernel.org, Christoph Hellwig <hch-AT-lst.de>, William Lee Irwin III <wli-AT-holomorphy.com>, David Chinner <dgc-AT-sgi.com>, Jens Axboe <jens.axboe-AT-oracle.com>, Badari Pulavarty <pbadari-AT-gmail.com>, Maxim Levitsky <maximlevitsky-AT-gmail.com>, Fengguang Wu <fengguang.wu-AT-gmail.com>, swin wang <wangswin-AT-gmail.com>, totty.lu-AT-gmail.com, hugh-AT-veritas.com, joern-AT-lazybastard.org
Archive-link:  Article, Thread



On Tue, 18 Sep 2007, Andrea Arcangeli wrote:
> 
> Many? I can't recall anything besides PF_MEMALLOC and the decision
> that the VM is oom.

*All* of the buddy bitmaps, *all* of the GPF_ATOMIC, *all* of the zone 
watermarks, everything that we depend on every single day, is in the end 
just about statistically workable.

We do 1- and 2-order allocations all the time, and we "know" they work. 
Yet Nick (and this whole *idiotic* thread) has all been about how they 
cannot work.

> In general every time reliability has a low priority than performance
> I've an hard time to enjoy it.

This is not about performance. Never has been. It's about SGI wanting a 
way out of their current 16kB mess.

The way to fix performance is to move to x86-64, and use 4kB pages and be 
happy. However, the SGI people want a 16kB (and possibly bigger) 
crap-option for their people who are (often _already_) running some 
special case situation that nobody else cares about.

It's not about "performance". If it was, they would never have used ia64 
in the first place.  It's about special-case users that do odd things.

Nobody sane would *ever* argue for 16kB+ blocksizes in general. 

		Linus

PS. Yes, I realize that there's a lot of insane people out there. However, 
we generally don't do kernel design decisions based on them. But we can 
pat the insane users on the head and say "we won't guarantee it works, but 
if you eat your prozac, and don't bother us, go do your stupid things".
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



(Log in to post comments)

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

Posted Sep 20, 2007 17:16 UTC (Thu) by zooko (subscriber, #2589) [Link]

In an article that I read on LWN today, the author speculated that some OpenBSD folks might be influenced by the rude behavior of their leader.

I was reminded of that when I read this note. I suppose Linus intended it to be funny, and I suppose many of his admirer will think it well put, but personally I find it to be rude. This seems to be the case with much of Linus's output.

Regards,

Zooko

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

Posted Sep 20, 2007 23:26 UTC (Thu) by lorien420 (guest, #44036) [Link]

As one of those admirers of Linus, I think this comment is rude.

My honest opinion is that the two share a similar operating procedure, but Linus doesn't foster the attitude as much. In the end, he seems to care more than Theo whether the rudeness and the flame-fest style posts are taking away from the discussion or not. It's a thin line, and you can easily attribute my claiming it to generally preferring what Linus has to say.

I appreciate a lot of what Theo has done as well, but he seems to have been a jerk in a way that makes people think he's a jerk. Linus seems to have turned it more into a character quirk that people laugh about.

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

Posted Sep 20, 2007 23:46 UTC (Thu) by nix (subscriber, #2304) [Link]

Also, Linus was known on the net as a bit of a flamer before he was known
as a kernel hacker. This has not changed. :)

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

Posted Sep 21, 2007 0:14 UTC (Fri) by xorbe (subscriber, #3165) [Link]

(I shouldn't feed trolls...)
The trick is that Linus sticks to technical discussions with
real facts generally, and usually manages a humorous angle.
Of course I have no facts to present, and have no funny
quips here. Cheers!

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

Posted Sep 27, 2007 13:15 UTC (Thu) by forthy (guest, #1525) [Link]

This is not only rude, this also exposes too much "we always have done it that way" thinking. If you think about why block sizes (and we are talking about disk transfers, not about page sizes in memory, where the tradeoff is different) are the size they are, you end up with latencies and transfer rates. The rule of thumb of a block size is that it takes about as long to access it (seek time) as to transfer the data (rates) - then the block size is optimal. For current disk hardware, the optimal block size is in the order of half a megabyte (more than 100 Linux blocks).

Now, the picture is far from clear, because hard disk access is done with prefetch, and people (file system designers) put related data that spans more than one block into consecutive blocks. This helps to get along with smaller than optimal blocks. But you still can do a quantitative approach on that: The seek time should not dominate in either case. With >100 blocks to be read in one go, this is unlikely, even when the kernel tries hard.

In the older days, system architects had expanded page size as needed. VAX had a page size of 512 bytes, and SCSI disks got their sector size of 512 since back then, this was the sweet spot. By the time the 386 became a page unit, the sweet spot has moved to 4k. Now, this is 2 decades ago, the sweet spot moved on, but since x86 is bug compatible to the 8080, nothing introduced at one point can be revised later. Maybe a few years from now we can use the 2MB superpages, because they will be close to the sweet spot not far ahead. But anyway, page sizes have different tradeoffs as block sizes, so the best thing is to separate the two.

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

Posted Sep 28, 2007 9:09 UTC (Fri) by Ross (subscriber, #4065) [Link]

The optimal size of a block also depends on fragmentation: the block is the minimum allocation unit. It's still true that most files on a system are small -- far smaller than half a megabyte.

I also somewhat disagree about the seek & transfer analysis. It assumes that blocks which are read or written together are not contiguous on disk, which they usually are, or that drivers aren't able to batch them up, which they do (and drives themselves have large buffers to make this work even for the stupidest drivers).

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds