Large block size support
The problem is that this patch is not without its difficulties. It adds a certain amount of complexity to the core virtual memory subsystem to implement what is, in all reality, a feature which has been rejected before: larger pages. The patch currently ducks the most difficult part of the problem - handling faults on larger pages, needed to make mmap() work - meaning that more complexity can be expected in the future. Larger blocks in the page cache means more demand for higher-order pages, which are already in short supply on many systems; that, in turn, means that the anti-fragmentation patches would almost certainly be needed as well. Use of larger pages in the page cache can also lead to more internal fragmentation and less efficient memory use.
For all these reasons, Andrew Morton has been expressing some reservations:
Andrew is not necessarily opposed to the patch; he is more concerned that it not be merged until it has been carefully compared with the alternatives. He suggests keeping the page cache entry size unchanged, but trying to allocate entries in higher-order groups. That would result in larger blocks being stored contiguously in memory without the memory subsystem changes. Filesystems could use those larger blocks, and hardware could treat them as single units in scatter/gather lists for DMA, leading to more efficient operations.
Another possibility which has been raised is raising the maximum size of hardware scatter/gather lists or allowing them to be chained. Drivers could then set up larger I/O operations, improving efficiency without requiring the other changes.
Still, there is support for Christoph's patch. It would make support of larger blocks relatively straightforward for the lower layers, perhaps enabling the removal of some real hacks found in some drivers and filesystems now. The patch would also allow ext3 filesystems with larger block sizes - sometimes created on ia64 systems, which use larger pages - to be mounted on other architectures. Christoph Hellwig likes the idea that a higher-order page cache could force a solution to the longstanding problem of physical memory fragmentation. To many, it seems like a straightforward and necessary solution to a longstanding problem.
So the large block size idea is unlikely to just go away. It may be a
while, though, before its proponents can do enough homework and
benchmarking to fully address the worries which have been expressed.
Fundamental changes are often the ones which take the longest to get into
the kernel, so there is little that is surprising here. Just don't ask for
a prediction of the final outcome.
Index entries for this article | |
---|---|
Kernel | Block layer |
Kernel | Memory management/Page cache |
Posted May 3, 2007 10:17 UTC (Thu)
by jospoortvliet (guest, #33164)
[Link] (1 responses)
I'd like to say I LOVE the kernel page this week. Not that it's worse other weeks, but it's just such a great read that I felt the need to express my feelings about it. The articles are very informative, nicely written - the kernel page is surely my most favorite one in the Weekly News :D
Though the frontpage is fun as well, a little controversy now and then doesn't hurt anybody ;-)
Posted May 3, 2007 20:29 UTC (Thu)
by jengelh (guest, #33263)
[Link]
(And some filler text to make LWN accept the comment.)
Posted May 4, 2007 11:20 UTC (Fri)
by PhilHannent (guest, #1241)
[Link]
Regards
hi,offtopic...
+1.offtopic...
Is Andrew suggesting the creation of a scalable page cache size?Large block size support