Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
Posted Apr 14, 2019 20:15 UTC (Sun) by rweikusat2 (subscriber, #117920)In reply to: Expedited memory reclaim from killed processes by wahern
Parent article: Expedited memory reclaim from killed processes
copy of the memory used for rule data structures.
Without overcommit, that is, lazy memory/ swap allocation as the need arises, this wouldn't work (in this way).
"Refuste to try because the system might otherwise run out of memory in future" is not a feature. The system might as well not. The kernel doesn't know this. Or it might run out of memory for a different reason. The original UNIX fork worked by copying the forking core image to the swap space to be swapped in for execution at some later time. This was a resource management strategy for seriously constrained hardware, not a heavenly relevation of the one true way of implementing fork.
Posted Apr 15, 2019 13:56 UTC (Mon)
by epa (subscriber, #39769)
[Link] (9 responses)
Perhaps programming languages could have better support for marking a data structure read-only, which would then notify the kernel to mark the corresponding pages read-only. Then you could allocate the necessary structure and mark it read-only before forking.
Posted Apr 15, 2019 15:01 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (8 responses)
I believe opting-in to overcommit is already possible, with the MAP_NORESERVE flag - which essentially says that the mapped range can be overcommitted, and defines behaviour if you write to it when there is insufficient commit available.
There's a bit of a chicken-and-egg problem here, though - heuristic overcommit exists because it's easier for system administrators to tell the OS to lie to applications that demand too much memory than it is for those self-same administrators to have the applications retooled to handle overcommit sensibly.
And even if you are retooling applications, it's often easier to simply turn on features like Kernel Same-page Merging to cope with duplication (e.g. in the Suricata ruleset in-memory form) than it is to handle al the fun that comes from opt-in overcommit.
Posted Apr 18, 2019 6:26 UTC (Thu)
by thestinger (guest, #91827)
[Link] (1 responses)
Posted Apr 18, 2019 7:26 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
Ah - on other systems (Solaris, at least, and IRIX had the same functionality under a different name), which do not normally permit any overcommit, it allows you to specifically flag a memory range as "can overcommit". If application-controlled overcommit ever becomes a requirement on Linux, supporting the Solaris (and documented) semantics would be a necessary part.
Posted Apr 18, 2019 6:32 UTC (Thu)
by thestinger (guest, #91827)
[Link] (5 responses)
The linux-man-pages documentation is often inaccurate, as it is in this case. MAP_NORESERVE does not do what it describes at all:
> When swap space is not reserved one might get SIGSEGV upon a write if no physical memory is available.
Posted Apr 18, 2019 6:40 UTC (Thu)
by thestinger (guest, #91827)
[Link]
Mappings that aren't committed and cannot be committed without changing protections don't have an accounting cost (see the official documentation that I linked) so the way to reserve lots of address space is by mapping it as PROT_NONE.
To make memory that has used not be accounted again while keeping the address space, you clobber it with new PROT_NONE memory using mmap with MAP_FIXED. It may seem that you achieve the same thing with madvise MADV_DONTNEED + mprotect to PROT_NONE but that doesn't work since it doesn't actually go through it all to check if it can reduce the accounted memory (for good reason).
Posted Apr 18, 2019 12:54 UTC (Thu)
by corbet (editor, #1)
[Link]
Posted Apr 18, 2019 15:24 UTC (Thu)
by rweikusat2 (subscriber, #117920)
[Link] (2 responses)
Posted Apr 19, 2019 15:03 UTC (Fri)
by lkundrak (subscriber, #43452)
[Link]
Posted Apr 25, 2019 14:30 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Apr 15, 2019 15:54 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
A better designed software would store rules in a file and map it explicitly into the target processes. This way there's no problem with overcommit - the kernel would know that the data is meant to be immutable.
Posted Apr 15, 2019 17:03 UTC (Mon)
by rweikusat2 (subscriber, #117920)
[Link] (3 responses)
This statement means nothing (as it stands).
> A better designed software would store rules in a file and map it explicitly into the target processes. This way there's no problem
A much more invasive change to suricata (this is an open source project I'm not anyhow associated with) could have gotten rid of all the pointers in its internal data structures. Assuming this had been done and the code had also been changed to use a custom memory allocator instead of the libc one, one could have used a shared memory segment/ memory mapped file to implement the same kind of sharing. I'm perfectly aware of this. But this complication isn't really necessary with Linux as sharing-via-fork works just as well and is a lot easier to implement.
Posted Apr 15, 2019 17:10 UTC (Mon)
by rweikusat2 (subscriber, #117920)
[Link]
But that's still more complicated than just relying on the default behaviour based on knowing how the application will use the inherited memory.
Posted Apr 15, 2019 17:30 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (1 responses)
You could also, assuming it's backed by an mmaped file, just use MAP_FIXED to ensure that all the pointers match in every Suricata process; this works out best on 64-bit systems, as you need a big block of VA space available that ASLR et al won't claim.
Posted Apr 15, 2019 19:14 UTC (Mon)
by rweikusat2 (subscriber, #117920)
[Link]
Posted Apr 26, 2019 18:26 UTC (Fri)
by roblucid (guest, #48964)
[Link]
The disk backed data files can be shared amongst 1000's of VMs, right?
Then the VM system can be sure it's safe to fork without committing much memory and the apparent need for over-commit vanishes. I admit I haven't tried it and as I used VMs for isolation and jails for data sharing, not the kind of efficiency hack but conceptually I don't see why software developed in a stricter world couldn't handle the case reasonably.
Sparse arrays, are perhaps a better case for over-commit but again I wonder about memory map file and/or smarter data structures wouldn't be feasible for the programs which actually deliberately require these features, rather than by accident due to a permissive environment.
Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
The man pages are actively maintained. I am sure that Michael would appreciate a patch fixing the error.
Man pages
JFTR: On Linux, applications can actually handle SISEGV,
Expedited memory reclaim from killed processes
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
static void do_brk(int unused)
{
sbrk(128);
}
int main(int argc, char **argv)
{
unsigned *p;
signal(SIGSEGV, do_brk);
p = sbrk(0);
*p = atoi(argv[1]);
printf("%u\n", *p);
return 0;
}
If the signal handler is disabled, this program segfaults. Otherwise, the handler extends the heap and the faulting instruction then succeeds when being restarted. SIGSEGV is a synchronous signal, hence, this would be entirely sufficient to implement some sort of OOM-handling strategy in an application, eg, free some memory and retry or wait some time and retry.
Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
JFTR: On Linux, applications can actually handle SISEGV,
I'd be surprised if there were any Unixes on which this was not true, given that SIGSEGV in particular was one of the original motivations for the existence of signal handling in the first place.
Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
> with overcommit - the kernel would know that the data is meant to be immutable.
Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
Expedited memory reclaim from killed processes
