|| ||Tejun Heo <tj-AT-kernel.org> |
|| ||Linus Torvalds <torvalds-AT-linux-foundation.org> |
|| ||Re: USB device cannot be reconnected and khubd "blocked for more
than 120 seconds" |
|| ||Tue, 15 Jan 2013 15:50:43 -0800|
|| ||Ming Lei <ming.lei-AT-canonical.com>,
Alex Riesen <raa.lkml-AT-gmail.com>,
Alan Stern <stern-AT-rowland.harvard.edu>,
Jens Axboe <axboe-AT-kernel.dk>,
USB list <linux-usb-AT-vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel-AT-vger.kernel.org>,
Arjan van de Ven <arjan-AT-linux.intel.com>|
|| ||Article, Thread
cc'ing Arjan. Arjan, the original thread can be read from
On Tue, Jan 15, 2013 at 12:18:01PM -0800, Linus Torvalds wrote:
> I think that is a good solution if it works, but look out: we need to
> synchronize across *all* domains, not just the default one. The sd.c
> code, for example, uses its own "scsi_sd_probe_domain" for example,
> and we *do* want to synchronize with it.
> Can you do that with your suggested interface (ie it would have to be
> a *global* sequence number).
So, I've been thinking about it for a while now and it looks like
async is cutting too many corners to implement any sane stackable
flushing scheme on top. There simply isn't much information to
determine who should wait for what.
I've thought of two workarounds. Both suck.
A. Try to detect deadlock conditions from synchronize(). If deadlock
condition involving other async jobs are detected, whine about it
and then skip. Ignore deadlock condition on self (should solve
this particular case).
Detecting deadlock condition isn't difficult if there are only
global synchronizations; unfortunately, fragmented dependencies via
domain-local synchronization makes this non-trivial.
We can still do ignore-self thing mostly trivially tho. This will
at least work around the problem at hand.
B. The ranged synchronization I first suggested. The problem with
this is that it's a common practice for a given async job to try to
flush anything which comes before it. This can introduce spurious
synchronization dependencies which can then lead to deadlocks.
These conditions can be detected and ignored, at least only
considering global synchronizations. The problem here is that
those deadlock conditions will occur under normal usage and thus
should be ignored silently, which basically makes synchronization
silently ignore and finish successfully even if there are
legitimate deadlocks which should be investigated.
For now, I'm gonna implement simple "I'm not gonna wait for myself"
self-deadlock avoidance. If this needs any more sophistication, I
think we better reimplement it so that we can explicitly match up and
track who's gonna wait for what instead of throwing everything into a
single cookie space and then try to work back from there.
to post comments)