|
|
Log in / Subscribe / Register

Allocation failures

Allocation failures

Posted Apr 25, 2026 8:55 UTC (Sat) by mb (subscriber, #50428)
In reply to: Allocation failures by devdanzin
Parent article: Using LLMs to find Python C-extension bugs

> OOM-kill of a web worker handling 1000 concurrent requests takes down 999 innocent requests if one request tries to over allocate.

If the one process caused the OOM, the 999 are *not* innocent. They used up all the memory.
The system is designed incorrectly, if this can happen.
OOM is an emergency situation that cannot be handled in a sane way. Even if the one process handles it's NULL pointers correctly, the system's about-to-be OOM state persists and the next request will run into it.
The system is already dead.
The correct handling is to kill processes to free up significant amounts of memory instead of handling the failures that will keep happening.


to post comments

Allocation failures

Posted Apr 26, 2026 13:20 UTC (Sun) by devdanzin (subscriber, #183390) [Link]

> If the one process caused the OOM, the 999 are *not* innocent. They used up all the memory.
> The system is designed incorrectly, if this can happen.

In Python, it's possible to trigger a `MemoryError` without the memory being all used up. So the 999 can be innocent IMO. And that is without going into transient OOMs, where the system memory is exhausted by another process that gets OOM terminated. Your innocent Python process may well keep running correctly after getting a few allocation errors.

Examples of synthetic code that will raise a `MemoryError` with plenty of memory left (in fact, these work as the first line typed in the REPL):
>>> string = "a" * 9223372036854775807
>>> x = 1 << 1000000000000
>>> import decimal; decimal.getcontext().prec = decimal.MAX_PREC; decimal.Decimal(1) / 3
>>> from itertools import product; next(product(range(1 << 30), repeat=2))` # causes a "real" `MemoryError` (exhausts memory in the system), but it may be recoverable if not OMM terminated. Best fit for aborting though.

You may say a system that would let something like this is incorrectly designed, but it isn't such a far fetched situation. People do get this kind of `MemoryError` error in production, where expected input works but untrusted or problematic input causes the error to happen. And aborting in all of them may create a DoS where one need not exist.

> OOM is an emergency situation that cannot be handled in a sane way. Even if the one process handles it's NULL pointers correctly, the system's about-to-be OOM state persists and the next request will run into it.
> The system is already dead.
> The correct handling is to kill processes to free up significant amounts of memory instead of handling the failures that will keep happening.

I do not agree, as shown above you can get `MemoryError` in CPython because the requested allocation is too big, even with plenty of memory left. A recoverable situation. The system isn't necessarily near OOM nor dead. So killing processes isn't always the right call.

So, all in all, I think there are plenty of situations where defending against and handling `MemoryError` in CPython makes sense. Of course, there are situations like you describe where aborting would be the right choice. Given all the above, what do you think?


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds