|| ||Tejun Heo <tj-AT-kernel.org> |
|| ||Peter Zijlstra <peterz-AT-infradead.org> |
|| ||Re: workqueue thing |
|| ||Wed, 23 Dec 2009 13:18:40 +0900|
|| ||torvalds-AT-linux-foundation.org, awalls-AT-radix.net,
linux-kernel-AT-vger.kernel.org, jeff-AT-garzik.org, mingo-AT-elte.hu,
dhowells-AT-redhat.com, arjan-AT-linux.intel.com, avi-AT-redhat.com,
|| ||Article, Thread
On 12/22/2009 08:06 PM, Peter Zijlstra wrote:
> On Tue, 2009-12-22 at 08:50 +0900, Tejun Heo wrote:
>>> 3) gets fragile at memory-pressure/reclaim
>> Shared dynamic pool is going to be affected by memory pressure no
>> matter how you implement it. cmwq tries to maintain stable level of
>> workers and has forward progress guarantee. If you're gonna do shared
>> pool, it can't get much better.
> And here I'm questioning the very need for shared stuff, I don't see
> any. That is, I'm not seeing it being worth the hassle.
Then you see the situation pretty different from the way I do. Maybe
it's caused by the different things we work on. Whenever I want to
create something which would need async context, I'm always faced with
these tradeoffs that I think is silly to worry about at that layer.
It ends up scattering partial solutions all over the place.
libata has two workqueues just because one may depend on the other.
The workqueue used for polling is MT to increase parallelism in case
there are multiple devices which would require polling but it's both
wasteful and not enough - they won't be used most of the time but they
aren't enough when there are multiple pollers on the same CPU. libata
just had to make a rather mediocre in-the-middle tradeoff between
having one poller for each device and sharing single poller for all
The same goes for EH threads. How often they are used heavily depends
on the system configuration. For example, libata handles ATAPI CHECK
CONDITION as an exception and acquire sense data from the exception
handler and it happens pretty frequently. So, I want to have
per-device EHs and have ideas on how to escalate from device level EH
to host level EH. The problem here again is how to maintain the
concurrency because having a single kthread for each block device
won't be acceptable from scalability POV.
Another similar but less severe problem is in-kernel media presence
pollers. Here, I think I can have a single poller for each device
without having too many scalability issues but it just isn't efficient
because most of the time one poller would be enough. It's only when
you get to the corner cases or error conditions when you would need
more than one. So, again, I can implement a special poller pool for
And there are slow work and async both of which are there just to
provide process context to tasks which may take quite some time to
complete waiting for IOs and quite a few ST workqueues which got
separated out because they somehow got involved in some obscure
deadlock condition and the only reason they're ST is because MT would
create too many threads. CPU affinity would work better for them but
they have to make these tradeoffs.
So, if we can have a mehanism which can solve these issues, it's an
obvious plus. Shifting complexity out of peripheral code to better
crafted and managed core code is the right thing to do and it will
shift a lot of complexity out of peripheral codes.
to post comments)