In defense of per-BDI writeback
Chris Mason has tried to provide that justification with a combination of benchmark results and explanations. The benchmarks show a clear - and large - performance improvement from the use of per-BDI writeback. That is good, but does not, by itself, justify the switch to per-BDI writeback; Andrew had suggested that the older code was slower as the result of performance regressions introduced over time by other changes. If the 2.6.31 code could be fixed, the performance improvement could be (re)gained without replacing the entire subsystem.
What Chris is saying is that the old, per-CPU pdflush method could not be fixed. The fundamental problem with pdflush is that it would back off when the backing device appeared to be congested. But congestion is easy to cause, and no other part of the system backs off in the same way. So pdflush could end up not doing writeback for significant periods of time. Forcing all other writers to back off in the face of congestion could improve things, but that would be a big change which doesn't address the other problem: congestion-based backoff can defeat attempts by filesystem code and the block layer to write large, contiguous segments to disk.
As it happens, there is a more general throttling mechanism already built
into the block layer: the finite number of outstanding requests allowed for
any specific device. Once requests are exhausted, threads generating block
I/O operations are forced to wait until request slots become free again.
Pdflush cannot use this mechanism, though, because it must perform
writeback to multiple devices at once; it cannot block on request
allocation. A per-device writeback thread can block there, though,
since it will not affect I/O to any other device. The per-BDI patch
creates these per-device threads and, as a result, it is able to keep
devices busier. That, it seems, is why the old writeback code needed to be
replaced instead of patched.
Index entries for this article | |
---|---|
Kernel | Block layer/Writeback |
Kernel | Memory management/Writeback |
Posted Oct 1, 2009 8:33 UTC (Thu)
by axboe (subscriber, #904)
[Link]
Posted Oct 1, 2009 14:54 UTC (Thu)
by peter_w_morreale (guest, #30066)
[Link]
The old writeback code traversed super blocks in order, skipping over those currently congested and without regard to the throughput of the devices backing the supers. Recall that the old writeback code/pdflush indiscriminately issues writes until the memory threshold is reached.
This could have (and probably did) lead to possible performance penalties for applications referencing the *fast* devices while consequently improving the performance of apps on the slow devices. It certainly lead to unfairness issues wrt who dirties memory and who cleans it.
Consider the followed kludged example to illustrate the point. Two apps, both dirtying pages at the same rate, one app backed by a "fast" device, the other by a "slow" device. Both apps are contributing to the dirty page count at the same rate, so now pdflush and writeback are kicking in.
Since the slow device will maintain a "congested" state longer (since it is "slow"), the faster device will eventually account for more cleaning of pages than the slow device.
This has two effects:
1) Dirty pages for the app on the slow device potentially stay in memory longer and have a better chance of being re-referenced without I/O.
2) Dirty pages for the "fast" device are more likely to be written out and consequently require an I/O for re-reference.
So we wind up penalizing the app on the fast storage device. In theory at least. :-)
I haven't looked at the per-BDI code, but with such it is now possible to apply fairness to ensure that each device cleans its share of dirty pages. (Whether that is a good thing or not, I don't know, its just that it enables the capability.)
Posted Oct 3, 2009 15:00 UTC (Sat)
by anton (subscriber, #25547)
[Link] (2 responses)
I do not think that the problem was in the flash device, because it
was originally new (no need to shuffle old data around), the slowdown
occured pretty soon (not only near the end), and various measures
taken at the host end helped (like invoking sync, or writing the data in
smaller batches which syncing in between).
I had a similar experience when trying to fill my 8GB ogg player with
music, except that this device was slow to begin with (3MB/s when
writing a few hundred MB), but filling it up still should not have
taken 8 hours (280KB/s).
Posted Oct 11, 2009 6:30 UTC (Sun)
by mfedyk (guest, #55303)
[Link] (1 responses)
If I copied files with cp or mv, I noticed a marked improvement in throughput compared to the gnome file manager.
Try it again with mv or cp and see if there is a difference.
Posted Oct 11, 2009 12:59 UTC (Sun)
by anton (subscriber, #25547)
[Link]
pdflush
In defense of per-BDI writeback
In defense of per-BDI writeback
The fundamental problem with pdflush is that it
would back off when the backing device appeared to be congested.
That might explain the huge slowdowns I saw (on Linux 2.6.19 and
2.6.27) when writing several GB to flash devices. One was a pretty
fast 8GB SD card (SDHC class 6 (i.e., >6MB/s on a certain workload),
and I typically saw >10MB/s when writing several hundred MB), yet it
took several hours to fill up; I no longer remember if the system also
suffered in other ways. Calling sync now and then seemed to help,
but the whole thing still took a very long time.
In defense of per-BDI writeback
I did use cp.
In defense of per-BDI writeback