|
|
Log in / Subscribe / Register

Toward a real "too small to fail" rule

By Jonathan Corbet
March 18, 2024
Kernel developers have long been told that any attempt to allocate memory might fail, so their code must be prepared for memory to be unavailable. Informally, though, the kernel's memory-management subsystem implements a policy whereby requests below a certain size will not fail (in process context, at least), regardless of how tight memory may be. A recent discussion on the linux-mm list has looked at the idea of making the "too small to fail" rule a policy that developers can rely on.

The kernel is unable to use virtual memory, so it is strictly bound by the amount of physical memory in the system. Depending on what sort of workload is running, that memory could be tied up in various ways and unavailable for allocation elsewhere. Allowing allocation requests to fail gives the kernel the freedom to avoid making things worse when memory pressure is high.

There are some downsides to failing an allocation request, of course. Whatever operation needed that memory will also be likely to fail, and that failure will probably propagate out to user space, resulting in disgruntled users. There is also a significant chance that the kernel will not handle the allocation failure properly, even if the developers have been properly diligent. Failure paths can be hard to test; many of those paths in the kernel may never have been executed and, as a consequence, many are likely to have bugs. Unwinding an operation halfway through can be a complex business, which is not the kind of task one wants to see entrusted to untested code.

Recently, Neil Brown started a sub-thread in a wide-ranging discussion on memory-management policies by suggesting a reconsideration of the rules around GFP_KERNEL allocations. Currently, programmers have to be prepared for those calls to fail, even if, in fact, the kernel will not fail small allocations. Brown proposed to make the "too small to fail" behavior a documented rule, at least for allocations below a predefined size. GFP_KERNEL allocations are allowed to sleep, he said, and thus have access to all of the kernel's machinery for freeing memory. In the worst case, the out-of-memory (OOM) killer can be summoned to remove a few processes from the system. If this code is unable to create some free memory, he said, "the machine is a goner anyway". If, instead, GFP_KERNEL allocations would always succeed, he concluded, it "would allow us to remove a lot of untested error handling code".

Kent Overstreet objected to this idea, though. It is common, he said, for kernel code to attempt to allocate memory to carry out a task efficiently, but to be able to fall back to a slower approach if the memory is unavailable; such mechanisms will not work if memory requests do not fail. Even worse, the kernel's efforts to satisfy such requests may worsen performance elsewhere in the system. Without allocation failure, there is no signal to indicate that memory is tight; the implementation of memory overcommit for user space has, he said, made it impossible to use memory efficiently there.

The real solution, he said, is proper testing of all those error paths; "relying on the OOM killer and saying that because [of] that now we don't have to write and test your error paths is a lazy cop out". James Bottomley disagreed, pointing out that the OOM killer only runs in extreme situation, and that error paths are a problem. "Error legs are the least exercised and most bug, and therefore exploit-prone pieces of code in C. If we can get rid of them, we should." Overstreet was unimpressed: "Having working error paths is _basic_, and learning how to test your code is also basic. If you can't be bothered to do that you shouldn't be writing kernel code."

Dave Chinner, instead, was enthusiastically supportive of the idea. The XFS filesystem, he said, was originally developed for a kernel (IRIX) that provided a guarantee for allocations. "A simple change to make long standing behaviour an actual policy we can rely on means we can remove both code and test matrix overhead - it's a win-win IMO."

Brown later modified his proposal slightly, noting that changing the semantics of GFP_KERNEL might cause problems for existing code. Instead, perhaps, GFP_KERNEL could be deprecated entirely in favor of a new set of allocation types. He later suggested this hierarchy:

  • GFP_NOFAIL would explicitly request the "cannot fail" behavior and could, as a result, wait a long time for an allocation request to be fulfilled.
  • GFP_KILLABLE would be the same as GFP_NOFAIL, with the exception that requests will fail in the presence of a fatal signal.
  • GFP_RETRY would make multiple attempts to satisfy an allocation request, but would eventually fail if no progress is made.
  • GFP_NO_RETRY would only allow a single attempt (which could still sleep) at allocating memory, after which the request would fail.
  • GFP_ATOMIC would not sleep at all (which is the current behavior).

Given these options, he said, GFP_KERNEL could go:

I don't see how "GFP_KERNEL" fits into that spectrum. The definition of "this will try really hard, but might fail and we can't really tell you what circumstances it might fail in" isn't fun to work with.

Overstreet responded, once again, that these changes were not needed: "We just need to make sure error paths are getting tested - we need more practical fault injection, that's all." Chinner, instead, commented that GFP_KILLABLE and GFP_RETRY were essentially the same thing; Brown responded that, perhaps, the key distinguishing feature of those allocation types is that they would not invoke the OOM killer; perhaps both of them could be replaced with a single GFP_NOOOM type. "We might need a better name than GFP_NOOOM :-)".

Matthew Wilcox raised a different sort of objection. The proper allocation policy for any given request depends on the context in which the request is made; a function called from an interrupt handler has fewer options available than one running in process context. Sometimes, the code that knows about that context is several steps back in the call chain from the function doing the allocation. The way to set the allocation type, he said, is through the use of context flags applied to the current thread.

Brown, though, pointed out that this context is not the full picture. If code has been written assuming GFP_NOFAIL behavior, it would be incorrect to allow the context to change an allocation into one that could fail: "context cannot add error handling".

Vlastimil Babka worried that deprecating GFP_KERNEL would be an unending task. Instead, guaranteeing "too small to fail" could be done quickly, and modifying specific call sites to allow allocation failure would be a relatively easy task, so he suggested taking that path. Brown, though, answered that removing the big kernel lock also took a long time: "I don't think this is something we should be afraid of". Since redefining GFP_KERNEL also implies removing error-handling code, he said, it should still be handled one call site at a time.

The discussion wound down at about this point, but there is a good chance that we'll be hearing these ideas again. The kernel, for all practical purposes, already implements GFP_NOFAIL behavior for allocations of eight pages or less. Turning the behavior into a guarantee would allow for significant simplification and the removal of a lot of untested code. That is an idea with significant appeal; it would be surprising if it did not come up at the Linux Storage, Filesystem, Memory-Management and BPF Summit in May.

Index entries for this article
KernelMemory management/Page allocator


to post comments

Toward a real "too small to fail" rule

Posted Mar 18, 2024 16:57 UTC (Mon) by Baughn (subscriber, #124425) [Link] (22 responses)

"Just write correct code" appears an unfortunately common sentiment.

One should test error paths, of course. But bugs happen, and *any* additional code is another opportunity for failure. If there is an opportunity to reduce the size of the codebase, especially by replacing badly-tested code with a presumably well-tested central allocator, shouldn't we take that opportunity?

Toward a real "too small to fail" rule

Posted Mar 18, 2024 17:15 UTC (Mon) by Sesse (subscriber, #53779) [Link] (21 responses)

Also, does the kernel have real testing infrastructure? I know there are a couple of self-tests you can run on startup, but is there something that scales to hundreds of thousands of test cases giving significant coverage across key systems?

Toward a real "too small to fail" rule

Posted Mar 18, 2024 18:02 UTC (Mon) by atnot (guest, #124910) [Link] (20 responses)

There's a bunch of test systems (some would say too many) but the closest there is probably kunit.

Having a unit test system is the easy part though, the hard part is that the code needs to be (re)written in a style that makes unit testing possible in the first place, which really requires most of the community to buy into your system. Otherwise you're stuck in an endless cycle of mocking and stubbing and enormous setup code and slow unit tests and writing dummy drivers etc. which is, in practice, too much annoyance for most developers to bother with. So you end up with whatever random integration tests and shell scripts someone wrote once. Which nobody actually runs during development because it takes way too long.

Incidentally stochastic systems like error injection are really the worst here, because in my experience they always take a ton of effort to set up in a useful way, ages to run with any confidence and give hard to interpret results.

Toward a real "too small to fail" rule

Posted Mar 18, 2024 18:21 UTC (Mon) by Sesse (subscriber, #53779) [Link] (4 responses)

Yes, so there's no basic way of actually knowing the error handling is correct, since it's basically untested? Only really human inspection, which we know requires superhumans to work well?

Toward a real "too small to fail" rule

Posted Mar 18, 2024 20:36 UTC (Mon) by vegard (subscriber, #52330) [Link] (3 responses)

syzbot runs syzkaller on a huge amount of kernel code reachable from userspace (though probably not all of it, under all conceivable conditions). With fault injection you could be pretty sure that it eventually tests most failure paths. I had some patches at some point for detecting novel call stacks and injecting a fault there: https://lore.kernel.org/all/20161016155612.4784-10-vegard...

Toward a real "too small to fail" rule

Posted Mar 21, 2024 2:15 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (2 responses)

ISTR reading an LWN article a while back about kernel developers complaining that syzbot finds too much... is it actually finding non-bugs, or is it finding bugs-that-we-don't-care-about-very-much? If the latter, then I fear this may be a people problem rather than a technical problem.

Toward a real "too small to fail" rule

Posted Mar 21, 2024 7:03 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

Clarifying since I realized this can be misinterpreted: A "people problem" could mean anything from "some developers don't care about writing correct code," all the way to "well, of course they *care*, they just don't have enough people to physically do the work, so it won't get done." Those are both "people problems," but of very different kinds.

Toward a real "too small to fail" rule

Posted Mar 21, 2024 7:33 UTC (Thu) by vegard (subscriber, #52330) [Link]

Not a maintainer, but my impression is that some maintainers are stressed and syzbot is adding to their workload.

I can understand it; if you're completely underwater already and somebody comes in and piles additional work that isn't even an issue that a real person ran into, while feeling the pressure of providing and validating fixes (if nobody else is), it seems unnecessary and annoying.

Toward a real "too small to fail" rule

Posted Mar 18, 2024 18:48 UTC (Mon) by koverstreet (✭ supporter ✭, #4296) [Link] (14 responses)

kunit isn't what you want for testing memory allocation failure paths, you'll never have unit tests for this stuff. You just need to run the big integration tests that you already have (e.g. fstests) with fault injection.

The thing that's made this hard in the past is that you need to be specific about which allocation callsites you're injecting errors into, or your tests will fail in ways that you weren't expecting. That's the capability we don't have yet: we have error injection in the kernel, but it doesn't have facilities for defining per callsite injection points and I've never seen anyone use it for testing memory allocation failures.

I have a fault injection framework that can do this which I'll be posting as soon as memory allocation profiling gets merged. Memory allocation profiling adds all the infrastructure - code tagging, which it uses as well, and it turns allocations into macro calls so we can do per-callsite hooking.

I've used this approach in the past, on early bcachefs/late bcache, and it worked well; test coverage by percentage of lines of source code was > 90%. It's just taken awhile to get all the pieces assembled into a form that's upstreamable.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 6:31 UTC (Tue) by PengZheng (subscriber, #108006) [Link] (12 responses)

> kunit isn't what you want for testing memory allocation failure paths, you'll never have unit tests for this stuff. You just need to run the big integration tests that you already have (e.g. fstests) with fault injection.

> The thing that's made this hard in the past is that you need to be specific about which allocation callsites you're injecting errors into, or your tests will fail in ways that you weren't expecting. That's the capability we don't have yet: we have error injection in the kernel, but it doesn't have facilities for defining per callsite injection points and I've never seen anyone use it for testing memory allocation failures.

We did lots of memory allocation error injections in unit tests, though in user space.
And the error injectors are very easy to write, which are actually generated by GitHub Copilot.

https://github.com/apache/celix/blob/master/libs/error_in...

The way to use it is quite straightforward: just link to the CMake target Celix::malloc_ei.

Specifying precise call sites to fail is also easier:
https://github.com/apache/celix/blob/master/libs/error_in...

Toward a real "too small to fail" rule

Posted Mar 19, 2024 10:42 UTC (Tue) by gray_-_wolf (subscriber, #131074) [Link] (11 responses)

> And the error injectors are very easy to write, which are actually generated by GitHub Copilot.

Do I understand it right that the file https://github.com/apache/celix/blob/master/libs/error_in... was generated by Copilot?

Toward a real "too small to fail" rule

Posted Mar 19, 2024 10:52 UTC (Tue) by PengZheng (subscriber, #108006) [Link] (10 responses)

Every single stub is generated by Copilot not the whole C file.

There is no copyright issue, since there is almost only one way to write stub, such as

void *__real_malloc(size_t);
CELIX_EI_DEFINE(malloc, void *)
void *__wrap_malloc(size_t size) {
errno = ENOMEM;
CELIX_EI_IMPL(malloc);
errno = 0;
return __real_malloc(size);
}

There is actually no real difference if it is written by hand or generated by AI.
By the way, it is us create the pattern.
AI just recognize the pattern we created and follows the pattern.

Rest assured that 100% no copyright issue involved.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 11:16 UTC (Tue) by gray_-_wolf (subscriber, #131074) [Link] (9 responses)

Actually what surprised me was the license of the file, since I was under the impression that LLM produced output is not copyrightable. So I am trying to understand why the file is under Apache license 2 instead of, I don't know, public domain or something.

Would you happen to know what the consensus here is? Can you just slap any license you want on the output of copilot?

Toward a real "too small to fail" rule

Posted Mar 19, 2024 11:30 UTC (Tue) by PengZheng (subscriber, #108006) [Link] (2 responses)

> Would you happen to know what the consensus here is? Can you just slap any license you want on the output of copilot?

If there is no difference between AI-generated output and hand-written codes, since there happens to be only one way of writing stub, why bother worrying copyright issues. Not to mention that I invented the mentioned way.

Moreover, the mentioned tests can be put under public domain or any license (because they are not essential part of the project), and are not installed anywhere outside of the project.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 12:45 UTC (Tue) by paulj (subscriber, #341) [Link] (1 responses)

If you are relying on the argument that there is only 1 way to write something, and hence there is nothing creative to it, and hence it is not copyrightable, then... surely you can not claim a copyright over it, and hence can not put a licence notice on it either?

Smells like you get some advice on this?

Toward a real "too small to fail" rule

Posted Mar 19, 2024 12:53 UTC (Tue) by PengZheng (subscriber, #108006) [Link]

Is there any argument? I just said there is nothing to worry about and I really don't care these issues.
The licenses put there are just IDE boilerplate.

If any one interested to correct this, just send a PR to remove these licenses claim, I will happily apply them.
Unless some lawyers find me, I won't bother worrying this and leave it as is.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 11:35 UTC (Tue) by Wol (subscriber, #4433) [Link] (5 responses)

You shouldn't be able to slap ANY copyright on the output of Co-Pilot. After all, it's not your work. HOWEVER.

If you fed it a template and said "write in this style", then your input is your copyright. Co-Pilot's work is a derived work, therefore your copyright flows through to the output.

Equally, if you review AND CORRECT Co-Pilot's work, then you have created a derivative work, and you can claim copyright in the derivative.

But no, as a fundamental rule I think it's been pretty much decided that Co-Pilot's work is not copyrightable, what matters is the pre-existing copyrights on the works Co-Pilot used as input, and post-process copyrights added to the output.

Europe I think has specifically said using copyrighted works as INPUT is not infringing, so the LLM "academics" are not breaching copyright - they have NOT said that using the output is non-infringing ...

Cheers,
Wol

Toward a real "too small to fail" rule

Posted Mar 19, 2024 13:46 UTC (Tue) by josh (subscriber, #17465) [Link] (2 responses)

> But no, as a fundamental rule I think it's been pretty much decided that Co-Pilot's work is not copyrightable

This has not been legally determined yet, and cautious users should avoid assuming this. It may also also end up depending on jurisdiction.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 16:45 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

Well if a Gorilla can't get copyright, I seriously doubt an AI can!

I think it's a pretty safe bet you cannot put an AI down as the copyright holder. An AI can't ADD copyright, nor can it REMOVE it.

Feeding stuff in is (in places like Europe) defined in law as "non-infringing", aiui, but it says nothing about getting stuff out again :-)

Cheers,
Wol

Toward a real "too small to fail" rule

Posted Mar 19, 2024 21:22 UTC (Tue) by josh (subscriber, #17465) [Link]

Ah, when you said "Co-Pilot's work is not copyrightable", I thought you were saying you thought AI output was in general not copyrightable (not legally determined), rather than that the AI itself cannot be an author and hold copyright (legally quite certain barring unlikely changes in law).

> nor can it REMOVE it.

We're likely on the same page, then. Hopefully courts agree.

Toward a real "too small to fail" rule

Posted Mar 21, 2024 11:33 UTC (Thu) by Karellen (subscriber, #67644) [Link] (1 responses)

If you fed it a template and said "write in this style", then your input is your copyright. Co-Pilot's work is a derived work, therefore your copyright flows through to the output.

Hmmm..... if you wrote a story where half the characters were lifted from another story, and all their dialogue were copied verbatim from that story, with your new characters reacting differently to those events, then your inputs from the new characters is your copyright, and the work as a whole is derived from your work.

...but, it is also derived from the story you got the other characters from. Just because some of the work is yours, does not automatically give you the right to claim copyright over the whole work. See also, music sampling. Just because you wrote 95% of a song, that does not allow you to claim copyright over a small sample you lifted from elsewhere and looped in the background of your track. Even if you chop a small sample up into even smaller pieces and re-order them to be almost unrecognisable, that fact that you got those pieces from an existing work is the issue.

I don't think it's reasonable to conclude that the copyright of all the other inputs to a final work (e.g. all the other inputs to Co-Pilot's training) can just be ignored, simply because you have a claim to one of the inputs, or even to a majority of the inputs.

Toward a real "too small to fail" rule

Posted Mar 21, 2024 15:10 UTC (Thu) by Wol (subscriber, #4433) [Link]

>> If you fed it a template and said "write in this style", then your input is your copyright. Co-Pilot's work is a derived work, therefore your copyright flows through to the output.

> Hmmm..... if you wrote a story where half the characters were lifted from another story, and all their dialogue were copied verbatim from that story, with your new characters reacting differently to those events, then your inputs from the new characters is your copyright, and the work as a whole is derived from your work.

Are you reading via "Unread comments"? It's great, but it does strip context.

My intent (which I thought was obvious) was that the template was YOUR WORK, therefore YOUR COPYRIGHT. Therefore Co-Pilot's (or any other AI's) work is *also* your copyright.

Cheers,
Wol

Toward a real "too small to fail" rule

Posted Mar 19, 2024 9:48 UTC (Tue) by roc (subscriber, #30627) [Link]

You really need fault injection unit tests if you want this to work. Just running an integration test and injecting faults randomly, it's too easy to regress specific call sites and not get caught.

Toward a real "too small to fail" rule

Posted Mar 18, 2024 17:53 UTC (Mon) by mb (subscriber, #50428) [Link] (5 responses)

The statement that error paths are hard to test or almost impossible to test is wrong, though.

It is true that testing error paths in a system test is hard and often impossible.
But testing error paths in a unit test is pretty straight forward.
That of course means there must be unit tests. Which is rarely the case for kernel code.

Toward a real "too small to fail" rule

Posted Mar 18, 2024 20:29 UTC (Mon) by zorro (subscriber, #45643) [Link] (4 responses)

"The kernel, for all practical purposes, already implements GFP_NOFAIL behavior for allocations of eight pages or less"

How do you test an error path if there is no practical way to trigger it?

Toward a real "too small to fail" rule

Posted Mar 18, 2024 20:35 UTC (Mon) by mb (subscriber, #50428) [Link] (3 responses)

Fault injection and/or unit tests with stubbed/instrumented allocation.

What you are referring to is the system test with the normal kernel allocator. Which of course makes it impossible, as I already said.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 2:57 UTC (Tue) by shemminger (subscriber, #5739) [Link] (2 responses)

This has been done in the past with modified kernel allocator that returns failure randomly.
But there are so many error paths and most of them just lead to an unusable system (ie. panic on boot).

Toward a real "too small to fail" rule

Posted Mar 19, 2024 6:04 UTC (Tue) by sima (subscriber, #160698) [Link]

The fault injection framework allows you to limit faults to specific processes, which is I think the mode syzkaller uses. This still means you won't be able to test error paths that are executed from kernel threads and stuff like that (unless extra care is taking to include those), but there's a lot of error paths you can test.

I think the next level would be if syzkaller could decide which allocations to fail, so that it can include those paths in it's fuzzing exploration. But I haven't heard of anyone trying that yet.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 6:27 UTC (Tue) by mb (subscriber, #50428) [Link]

I am not talking about a system test. I'm talking about a unit test.
There is no boot or "unusable system" in a unit test.

Toward a real "too small to fail" rule

Posted Mar 18, 2024 19:31 UTC (Mon) by bof (subscriber, #110741) [Link]

> "We might need a better name than GFP_NOOOM :-)".

Eh, isn't the answer to that obvious? It's about Disabling OOM, right?

--> GFP_DOOM

Toward a real "too small to fail" rule

Posted Mar 19, 2024 8:47 UTC (Tue) by taladar (subscriber, #68407) [Link] (10 responses)

Basically the discussion on the anti-testing side seems to boil down to "I want the problem that in the real world this operation can fail to be someone else's problem because handling that failure would be too hard to do correctly"

To me this all reads as the death throes of the "sufficiently smart and disciplined" programmer sentiment that underlies the whole philosophy of using languages like C which do not give you much help checking the correctness of your code.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 11:00 UTC (Tue) by azumanga (subscriber, #90158) [Link]

If your kernel has reached the point where a 4K allocation doesn't work, you are very likely to hit a cascade effect, where attempts at recovering also fail to allocate, and allocations fail all over the kernel.

I've worked, once, on a system where we really cared about this type of thing -- but it was a very tiny embedded program, and it was still incredibly hard to set up enough testing that we were confident we hit all the possible failure cases. I can't imagine doing it in practice for something the size of the kernel, and I'm also not sure there would be any useful benefit to the kernel doing slightly better at surviving when such tiny allocations can't be satisfied, rather than just giving up altogether.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 14:20 UTC (Tue) by ballombe (subscriber, #9523) [Link] (1 responses)

Untested branch code is a problem in any language. For C, we have gcov for that.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 16:56 UTC (Tue) by kleptog (subscriber, #1183) [Link]

Some languages more than others though. In strict languages like Rust (and to some extent C++) the compiler can check and ensure invariants are not violated in the error paths even if they are never run.

So stuff like use-after-free, leaked memory, duplicated or leaks locks are not possible, and since those are precisely the kind of things that go wrong in error paths that's a huge win.

If your error handling is more complicated that 'clean up and return error' you probably need a test case anyway. But not having to write test cases for the common case saves a lot of time.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 17:21 UTC (Tue) by mb (subscriber, #50428) [Link] (2 responses)

For me the biggest problem with guaranteeing a too-small-to-fail threshold is that you can't ever lower the threshold without auditing all allocations kernel-wide before.
I do realize that is basically is also true today, because an error path could be broken.
But I still think it's a difference to have an error path that is probably correct vs. having no error path at all. In the second case you would have to audit the allocation and *write* the error path, if you want to lower the threshold.

Most error paths are actually not that hard to get right and could actually be automatically implemented in Rust without any manual code. (Drop is called automatically, in the correct order)

Toward a real "too small to fail" rule

Posted Mar 20, 2024 0:09 UTC (Wed) by neilbrown (subscriber, #359) [Link] (1 responses)

> For me the biggest problem with guaranteeing a too-small-to-fail threshold is that you can't ever lower the threshold without auditing all allocations kernel-wide before.

I think that API would have to guarantee that the allocation never fails - no matter what size. It may, however, block indefinitely.

The threshold would be implemented by a WARN_ON every time there was an attempt to allocate larger than the threshold.
We might also have a developer-mode where a message is generated each time there is a request with a larger size than the previous max - like the "used greatest stack depth" messages.
That way developers would have help to ensure it is only used for "small" allocations.

Toward a real "too small to fail" rule

Posted Mar 20, 2024 10:38 UTC (Wed) by farnz (subscriber, #17727) [Link]

In that respect, your proposal for a hierarchy of wait versus fail options from "retry forever until you succeed" through to "fail immediately" is effective; instead of allocations never failing because they're small, they're failure-proof because the caller has said "retry forever until you succeed".

If I understand this proposal fully, you'd then have a warning on "large" allocations that were marked to retry forever, since they're likely to block forever, and the "too small to fail" threshold becomes the warning threshold for allocations that are likely to block forever, instead of an API guarantee.

Toward a real "too small to fail" rule

Posted Mar 19, 2024 23:38 UTC (Tue) by neilbrown (subscriber, #359) [Link] (1 responses)

> "I want the problem that ..... to be someone else's problem"

This is a important part of any engineering practice including software engineering. Building complex systems is difficult and we have to share the load among different sub-specialists.

So we have to compiler solve some of our problems, static analysis and testing solve some of our problems. We use introspective helpers like lockdep. We use correctness annotations like assert() (aka BUG_ON()) and might_sleep() etc. We create library code and encourage its use. We look for patterns in other similar code and copy them.

Any problem that CAN be referred to some person or system with more specialized knowledge/skills should me. Trying to do it all yourself is a recipe for failure.

And removing error handling isn't just about removing possible bugs. It also removes dead code and so removes pressure on the icache.

Toward a real "too small to fail" rule

Posted Mar 20, 2024 10:13 UTC (Wed) by farnz (subscriber, #17727) [Link]

It's also why RAII is such a useful pattern whenever supported - it allows you to wrap up a resource so that users of the resource don't need to remember clean-up. Something like devres in the kernel, for example, where you claim resources as you want to work with them, and detaching from the device automatically frees all your resources for you.

Toward a real "too small to fail" rule

Posted Mar 20, 2024 20:28 UTC (Wed) by dvdeug (subscriber, #10998) [Link] (1 responses)

Brian Kernighan, one of the authors of C, said "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" So simplifying things, like making certain memory handling more centralized, certainly seems to be in the philosophy of the creators of C.

Toward a real "too small to fail" rule

Posted Mar 20, 2024 20:47 UTC (Wed) by mb (subscriber, #50428) [Link]

Actually, even more in the philosophy of the creators of C (by that definition) would be to use Rust, because it just auto-generates correct error handling code for the vast majority of cases.
;)


Copyright © 2024, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds