Not logged in
Log in now
Create an account
Subscribe to LWN
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
I'm yet again left thinking what would happen if the kernel devs actually thought about proper garbage collection.
The SLUB allocator
Posted Apr 12, 2007 21:43 UTC (Thu) by flewellyn (subscriber, #5047)
Posted Apr 12, 2007 23:57 UTC (Thu) by rwmj (subscriber, #5474)
Posted Apr 12, 2007 23:58 UTC (Thu) by bronson (subscriber, #4806)
Posted Apr 13, 2007 9:36 UTC (Fri) by nix (subscriber, #2304)
(You *need* proper garbage collection handling cycles if you implement AF_UNIX sockets with fd passing, or any bugger can DoS you grotesquely.)
Posted Apr 19, 2007 10:20 UTC (Thu) by jfj (guest, #37917)
I mean, why do: close the file descriptor, but leave the memory for the GC.
GC collection would then have to traverse the entire object space and kill all the caches for good! Just free the damn thing when you're there.
Posted Jul 5, 2010 11:14 UTC (Mon) by marcH (subscriber, #57642)
So can I have both please?
Posted Jul 5, 2010 14:49 UTC (Mon) by nye (guest, #51576)
Posted Apr 22, 2007 22:50 UTC (Sun) by RareCactus (guest, #41198)
Things that are good practice for sysadmins writing perl scripts or programmers writing Java or COBOL "business logic" are not good practice for kernel hackers.
Garbage collection is about managing memory lazily. It will always hold on to memory longer than equivalent non-GC code would. It will trash the data cache by rummaging through things unecessarily.
Traditional garbage collection algorithms like mark and sweep and generational GC tend to require the system to come to a "stop" while the garbage collector thread scans through everything. It should be obvious to anyone that this is unacceptable for a kernel, especially on a multi-core system. There are some newer algorithms that mitigate this behavior somewhat, at the cost of even worse performance.
The kernel is about efficiency. Yes, even in 2007. It's about writing code that everyone will use. It's about spending 2x the time to get a 10% speedup. It's about scaling up to 4, 8, 16 CPUs, and down to a 50 MHz antique.
The kernel manages many other resources besides memory: things like network connections, file descriptors, IRQs, hardware stuff. malloc() might make your head spin, but it's not as hard to write as the TCP stack.
Userspace can and should move more and more to high-level languages with garbage collection, first-class functions, the whole nine yards. I like Ruby, myself. And there are certain resources in the kernel that might benefit from a GC-like approach. But memory is not one of them.
Posted Jan 12, 2009 9:52 UTC (Mon) by Blaisorblade (guest, #25465)
But it doesn't apply in kernel space, apart from existing usage of manual refcounting, for several reasons:
- GCs depend on wasting memory space. A copying GC needs twice the space actually used by the application, by definition (one would use it just for the young generation of a generational GC, though). Refcounting uses exactly the needed space, no more, no less.
- a recent paper argued that a GC needs to use 5x the used memory to be faster than manual allocations, so that the cost of a scavenging (which is always big, even with generations) is paid infrequently enough. That's not an option in kernel.
- incremental/real-time GC (which do not stop execution) are much slower if fine tuned, and extremely hard to get right.
- kernel devs are extremely concerned about cache impact, even because they'd like to leave the caches for app usage as much as possible. And any GC which is not refcounting is notoriously bad about locality.
- when you work in a dynamic language and already have a VM, you might as well implement a GC, but here there is no VM whatsoever.
- All that is with general purpose memory allocators. The kernel makes extensive use of the SLAB/SLUB interface instead of kmalloc(), and that allocates fixed-size objects.
Even kmalloc is implemented on top of that: allocation sizes are rounded up mostly to the next power of 2, with a few exceptions. A 700-byte object will actually use 1024 bytes. See include/linux/kmalloc_sizes.h for the actual existing sizes.
- All the discussion argues about automatic refcounting. That means incrementing refcounts on each function call because you are pushing parameters on the operand stack. No kidding - CPython is like that. Multithreaded CPython was 2x slower because of atomic refcounts and locks around name lookups, and that was CPython 1.5.2 IIRC. Nowadays the rest is more optimized (even if it still hugely sucks), so the same slowdown (for the same number of refcounts) would have a given impact.
Now, the kernel obviously does not use refcounting like that. If they can live without recursive locks, they can also avoid incrementing a refcount in a thread which already owns a reference.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds