User: Password:
|
|
Subscribe / Log in / New account

Avoiding memory-allocation deadlocks

Avoiding memory-allocation deadlocks

Posted Apr 16, 2014 13:22 UTC (Wed) by blackwood (subscriber, #44174)
Parent article: Avoiding memory-allocation deadlocks

Excellent article. One bit I've missed though is some mention that lockdep is tracking the NOFS (or NOIO, I've forgotten which one it does track) allocation context and will detect and catch potential deadlocks even without actually being under memory pressure. Which is a really awesome and powerful debug feature and I wonder how that affects interface design tradeoffs for all the different ways to specify allocation constraints (i.e. per-allocation, per-process or global).


(Log in to post comments)

Avoiding memory-allocation deadlocks

Posted Apr 16, 2014 14:43 UTC (Wed) by ncm (subscriber, #165) [Link]

Is this really all as haphazard and ad-hoc as the article makes it seem? Is the problem space equally chaotic, so that only a reactive, band-aid approach works?

Avoiding memory-allocation deadlocks

Posted Apr 16, 2014 15:57 UTC (Wed) by RelytDeveloper (guest, #96089) [Link]

I may not be fully qualified to answer your question, but Kernel.org uses the GIT change management control platform; and hosts several instances of cGIT, a web-based GIT wrapper.

The Kernel is big. It's probably too big to read, and likely too big to understand in one sitting.

Using techniques such as GIT Bi-Section, find, locate, grep, awk, and sed are necessary tools to manage such a large code base.

Since we did not always have GIT, records of the Kernel pre-dating GIT are spotty. A Git-Bi-Section compares the last known working code base to the regression code base, and tells you which Change-ID-hashes account for the trouble.

It is a professional quality GNU/Linux Kernel, but it spans many architectures and lieutenant-headed projects, and eight (?) versions.

Avoiding memory-allocation deadlocks

Posted Apr 19, 2014 7:51 UTC (Sat) by nix (subscriber, #2304) [Link]

ncm wasn't talking about the process of analyzing the problem -- he was talking about the somewhat chaotic and ad-hoc 'add this gfp flag, whoops, that's not enough, make it a pf, now add a mask to ignore it again' iterative process.

I'd say this is unavoidable, not so much because the solution space is so large as because the requirements change. Things like swap-over-NFS, suspend, early kmalloc etc imposed new requirements regarding things like allocator locks that simply weren't there before. e.g. before swap-over-networking turned up, you could safely chuck away network packets that arrived when under heavy memory pressure: they'd be retransmitted later and all would be fine. But if that might be part of the swapout process, suddenly you can't ignore that any more and the whole panoply of mm locks collides with the whole panoply of networking locks and the networking layer's own allocations and, well...

I'll admit that I'm amazed that anything got done at all before lockdep was around. Even with lockdep, locking hierarchy maintenance still feels terribly haphazard. There's really not much documentation even for crucial core locks like the mmap_sem (one comment partially documenting core parts of the locking hierarchy, that's all). You're just expected to know, and if you don't know and lockdep doesn't save you (e.g. if spinlocks are involved) expect a nice long fun crashy debugging session.

Avoiding memory-allocation deadlocks

Posted Apr 16, 2014 22:44 UTC (Wed) by dlang (subscriber, #313) [Link]

> Is this really all as haphazard and ad-hoc as the article makes it seem?

yes and no

when a lock in introduced, there is a lot of analysis to try and make sure that there are no problems. It's not haphazard and ad-hoc

However, when you get a random bug report to analyze, you don't know if it is a locking error, what lock it could be, in what subsystem, let alone the changes since that lock went in place.

remember that in the last release cycle, there were just over 12,000 different changes from 14,000 individuals that added 591,000 lines of code while removing 250,000 lines

with the code this large, nobody can keep track of everything, and you can't put out a call to everyone to look at any one bug (let alone all of them).

The bisection approach lets you narrow down where the problem was introduced, so that you can get the correct people involved to troubleshoot it.

It's actually an incredibly efficient process, as much as it seems ad-hoc and haphazard at first glance.

Avoiding memory-allocation deadlocks

Posted Apr 17, 2014 19:23 UTC (Thu) by kjp (subscriber, #39639) [Link]

I was thinking the same thing after reading this article - amazement that it even works.


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds