Zig heading toward a self-hosting compiler
Zig heading toward a self-hosting compiler
Posted Oct 8, 2020 2:21 UTC (Thu) by roc (subscriber, #30627)In reply to: Zig heading toward a self-hosting compiler by khim
Parent article: Zig heading toward a self-hosting compiler
For almost every application, it simply isn't worth handling individual allocation failures. It *does* make sense to handle allocation failure as part of large-granularity failure recovery, e.g. by isolating large chunks of your application in separate processes and restarting them when they die. That works just fine with "fatal" OOM handling.
In theory, C and C++ support handling of individual allocation failures. In practice, it's very hard to find any C or C++ application that reliably does so. The vast majority don't even try and most of the rest pretend to try but actually crash in any OOM situation because OOM recovery is not adequately tested.
Adding OOM errors to every library API just in case one of those unicorn applications wants to use the library adds API complexity just where you don't want it. In particular, a lot of API calls that normally can't fail now have a failure case that needs to be handled/propagated.
Therefore, Rust made the right call here, and Zig --- although it has some really cool ideas --- made the wrong call.
Posted Oct 8, 2020 8:54 UTC (Thu)
by smcv (subscriber, #53363)
[Link]
dbus, the reference implementation of D-Bus, is perhaps a good example: it's meant to handle individual allocation failures, and has been since 2003, with test infrastructure to verify that it does (which makes the test suite annoyingly slow to run, and makes tests awkward to write, because every "assert success" in a test that exercises OOM turns into "if OOM occurred, end test successfully, else assert success"). Despite all that, we're *still* occasionally finding and fixing places where OOM isn't handled correctly.
The original author's article on this from 2008 <https://blog.ometer.com/2008/02/04/out-of-memory-handling...> makes interesting reading, particularly these:
> I wrote a lot of the code thinking OOM was handled, then later I added testing of most OOM codepaths (with a hack to fail each malloc, running the code over and over). I would guess that when I first added the tests, at least 5% of mallocs were handled in a buggy way
> When adding the tests, I had to change the API in several cases in order to fix the bugs. For example adding dbus_connection_send_preallocated() or DBUS_DISPATCH_NEED_MEMORY.
Posted Oct 8, 2020 15:22 UTC (Thu)
by khim (subscriber, #9252)
[Link] (12 responses)
It's like exception safety: it's insanely hard to redo an existing codebase to make it exception-safe. Google Style Guide even expressly forbids it. Yet if you use certain idioms and libraries — it becomes manageable.
If you want/need to handle OOM case the situation is similar: you change your code structure to handle that case… and suddenly it becomes much less troubling and hard to deal with.
I'm not sure Zig would manage to pull it off… but I wouldn't dismiss it because it tries to solve that issue: lots of issues with OOM handling in the existing applications/libraries come just from the fact that they design API for the usual “memory is infinte” world… and then try to add OOM handling to that… it doesn't work.
But you can go and check these old MS-DOS apps which had to deal with limited memory. They handle it just fine and it's not hard to make them show you “couldn't allocate memory” message without crashing. Please don't say that people were different back then and could do that, but today we lost that art. That's just not true.
Posted Oct 8, 2020 22:28 UTC (Thu)
by roc (subscriber, #30627)
[Link] (11 responses)
MS-DOS apps were a lot simpler than what we have today and often did misbehave when you ran out of memory. Those that did not often just allocated a fixed amount of memory at startup and were simple enough they could ensure they worked within that limit, without handling individual allocation failures. For example if you look up 'New' in the Turbo Pascal manual (chapter 15), you can see it doesn't even *mention* New returning OOM or how to handle it. The best you can do is call MaxAvail before every allocation, which I don't recall anyone doing. http://bitsavers.trailing-edge.com/pdf/borland/turbo_pasc...
Posted Oct 8, 2020 23:27 UTC (Thu)
by khim (subscriber, #9252)
[Link] (10 responses)
It's funny that you have picked Turbo Pascal 3.0 — the last version without proper care for the out-of-memory case. Even then it had $K+ options which was enabled by default and would generate a runtime error if memory was exhausted.
If you open the very site which you showed and look on the manual for the Turbo Pascal 4.0 — you'll find out Turbo Pascal itself used it and many other programs did, too. I don't quite sure when the notion of “safe programming” was abandoned, but I suspect it was when Windows arrived. Partly because Windows itself handles OOM conditions poorly (why bother making your program robust if the whole OS would come crashing down on you if you run out of memory?) and partially because it brought many new programmers to the PC which were happy to make programs which would work sometimes and cared not about making them robust.
Ultimately there's nothing mystic in writing such programs. Sure, you need tests. Sure, you need proper API. But hey, it's not as if you can handle other kinds of failures properly without tests and it's not as if you don't need to think about your API if you want to satisfy other kinds of requirements.
It's kind of a pity that Unix basically pushed us down the road of not caring about OOM errors with it's Your only hope at that point is something like what smartphones and routers are doing: split your hardware into two parts and put “reliable” piece into one and “fail-happy” piece into another. People would just have to deal with the need to do hard reset at times.
But is that the good way to go for the ubiquitous computing? Where failure and watchdog-induced reset may literally mean life-and-death? Maybe this two parts approach would scale. Maybe it would. IDK. Time will tell.
Posted Oct 9, 2020 3:08 UTC (Fri)
by roc (subscriber, #30627)
[Link] (1 responses)
> But hey, it's not as if you can handle other kinds of failures properly without tests and it's not as if you don't need to think about your API if you want to satisfy other kinds of requirements.
It sounds like you're arguing "You have to have *some* tests and *some* API complexity so why not just make those a lot more work".
Posted Oct 9, 2020 7:04 UTC (Fri)
by khim (subscriber, #9252)
[Link]
> It sounds like you're arguing "You have to have *some* tests and *some* API complexity so why not just make those a lot more work".
No. It's “a lot more work” if you don't think about it upfront. It's funny that this link is used as an example if how hard it is to handle OOM. Because it's really shows how easy it is to do. 5% of mallocs were
handled in a buggy way — means that 95% of them were handled correctly on the first try. That's a success rate much higher than for most other design decisions.
Handling OOM conditions is not hard, really. It's only hard if you already have finished code designed for the “memory is infinite” world and want to retrofit OOM-handling into it. Then it's really hard. Situation is very analogous to thread-safety, exception-safety and many other such things: just design primitives which handle 95% of work for you, and write tests to cover the remaining 5%.
Posted Oct 10, 2020 15:16 UTC (Sat)
by dvdeug (guest, #10998)
[Link] (7 responses)
Posted Oct 10, 2020 22:50 UTC (Sat)
by khim (subscriber, #9252)
[Link] (2 responses)
>It's quite possible you can't open a dialog box to tell the user of the problem without memory,
MacOS classic solved that by setting aside some memory for that dialog box.
>nor save anything.
Again: not a problem on MacOS since there application requests memory upfront and then have to deal with it. Other app couldn't “steal” it.
>I suspect it was when Windows arrived
And made it impossible to reliably handle OOM, yes. Most likely.
>What was once one program running on an OS simple enough to avoid memory allocation is now a complex collection of individually more complicated programs on a complex OS.
More complex than typical zOS installation? Which handles OOM just fine?
I don't think so.
No, I think you are right: when Windows (the original one, not Windows NT 3.1 which properly handles OOM, too) and Unix (because of SMP or general complexity had nothing to do with it. Just general Rise of Worse is Better.
As I've said: it's not impossible to handle and not even especially hard… but in a world where people just trained to accept the fact that programs may fail randomly for no apparent reason that thing is just entirely unnecessary.
Posted Oct 11, 2020 4:25 UTC (Sun)
by dvdeug (guest, #10998)
[Link] (1 responses)
You could do that anywhere. Go ahead and allocate all the memory you need upfront.
> More complex than typical zOS installation? Which handles OOM just fine?
If it does, it's because it keeps things in nice neat boxes and runs a closed set of IBM hardware, in the way that a desktop OS can't and doesn't. A kindergarten class at recess is more complex in some ways than a thousand military men marching in formation, because you never know when a kindergartner is going to punch another one or make a break for freedom.
> SMP or general complexity had nothing to do with it.
That's silly. If you're writing a game for a Nintendo or a Commodore 64, you know how much memory you have and you will be the only program running. MS-DOS was slightly more complicated, with TSRs, but not a whole lot. Things nowadays are complex; a message box calls into a windowing system and needs fonts loaded into memory and text shapers loaded; your original MacOS didn't handle Arabic or Hindi or anything beyond 8-bit charsets. Modern systems have any number of processes popping up and going away, and even if you're, say, a word processor, that web browser or PDF reader may be as important as you. Memory amounts will vary all over the place and memory usage will vary all over the place, and checking a function telling you how much memory you have left won't tell you anything particularly useful about what's going to be happening sixty seconds from now. What was once a tractable problem of telling how much memory is available is now completely unpredictable.
> Just general Rise of Worse is Better.
To quote that essay: "However, I believe that worse-is-better, even in its strawman form, has better survival characteristics than the-right-thing, and that the New Jersey approach when used for software is a better approach than the MIT approach." The simple fact is you're adding a lot of complexity to your system; there's a reason why so much code is written in memory-managed languages like Python, Go, Java, C# and friends. You're spending a lot of programmer time to solve a problem that rarely comes up and that you can't do much about when it does. (If it might be important, regularly autosave a recovery file; OOM is not the only or even most frequent reason your program or the system as a whole might die.)
> in a world where people just trained to accept the fact that programs may fail randomly for no apparent reason
How, exactly, does issuing a message box saying "ERROR: Computer jargon" going to help that? Because that's all most people are going to read. There is no way you can fix the problem that failing to open a new tab or file because the program is out of memory is going to be considered "failing randomly for no apparent reason" by most people.
I fully believe you could do better, but it's like BeOS; it was a great OS, but when it was made widely available in 1998, between Windows 98 and an OS that didn't run a browser that could deal with the Web as it was in 1998, people went with Windows 98. Worse-is-better in a nutshell.
Posted Oct 11, 2020 19:49 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
Like another saying - "the wrong decision is better than no decision". Just making a decision NOW can be very important - if you don't pick a direction to run - any direction - when a bear is after you then you very quickly won't need to make any further decisions!
Cheers,
Posted Oct 11, 2020 12:38 UTC (Sun)
by quboid (subscriber, #54017)
[Link] (3 responses)
The 16-bit Windows SDK had a tool called STRESS.EXE which, among other things, could cause memory allocation failures in order to check that your program coped with them correctly.
16-bit Windows required large memory allocations (GlobalAlloc) to be locked when being used and unlocked when not so that Windows could move the memory around without an MMU. It was even possible to specify that allocated memory was discardable and you didn't know whether you'd still have the memory when you tried to lock it to use it again - this was great for caches and is a feature I wish my web browser had today. :-)
Mike.
Posted Oct 11, 2020 21:14 UTC (Sun)
by dtlin (subscriber, #36537)
[Link]
Posted Oct 11, 2020 22:28 UTC (Sun)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Oct 15, 2020 16:19 UTC (Thu)
by lysse (guest, #3190)
[Link]
Posted Oct 8, 2020 19:23 UTC (Thu)
by excors (subscriber, #95769)
[Link] (1 responses)
E.g. FreeRTOS can be used with partly or entirely static allocation (https://www.freertos.org/Static_Vs_Dynamic_Memory_Allocat...). Your application can implement a new thread as a struct/class that contains an array for the stack, a StaticTask_t, and a bunch of queues and timers and mutexes and whatever. You pass the memory into FreeRTOS APIs which connect it to other threads with linked lists, so FreeRTOS doesn't do any allocation itself but doesn't impose any hardcoded bounds. And since you know your application will only have one instance of that thread, it can be statically allocated and the linker will guarantee there's enough RAM for it.
In terms of the application's call graph, you want to move the allocations (and therefore the possibility of allocation failure) as far away from the leaf functions as possible. Just do a few big allocations at a high level where it's easier to unwind. Leaf functions include the OS and the language's standard library and logging functions etc, so you really need them to be designed to not do dynamic allocation themselves, otherwise you have no hope of making this work.
The C++ standard library is bad at that, but the language gives you reasonable tools to implement your own statically-allocated containers (in particular using templates for parameterised sizes; it's much more painful in C without templates). From an extremely brief look at Zig, it appears to have similar tools (generics with compile-time sizes) and at least some of the standard library is designed to work with memory passed in by the caller (and the rest lets the caller provide the dynamic allocator). Rust presumably has similar tools, but I get the impression a lot of the standard library relies on a global allocator and has little interest in providing non-allocating APIs.
It's not always easy to write allocation-free code, and it's not always the most memory-efficient (because if your program uses objects A and B at non-overlapping times, it'll statically allocate A+B instead of dynamically allocating max(A,B)), but sometimes it is feasible and it's really nice to have the guarantee that you will never have to debug an out-of-memory crash. And even if you can't do it for the whole application, you still get some benefit from making large parts of it allocation-free.
(This is for code that's a long way below "complex applications" from a typical Linux developer's perspective. But nowadays there's still a load of development for e.g. IoT devices where memory is limited to KBs or single-digit MBs, implementing complicated protocols across a potentially hostile network, so it's a niche where a language that's nicer and safer than C/C++ but no less efficient would be very useful.)
Posted Oct 8, 2020 22:33 UTC (Thu)
by roc (subscriber, #30627)
[Link]
There is a growing ecosystem of no-allocation Rust libraries, and Rust libraries that can optionally be configured to not allocate. These are "no-std" (but still use "core", which doesn't allocate). https://lib.rs/no-std
Rust const generics (getting closer!) will make static-allocation code easier to write in Rust.
Zig heading toward a self-hosting compiler
Zig heading toward a self-hosting compiler
Zig heading toward a self-hosting compiler
Zig heading toward a self-hosting compiler
HeapError error-handling routine there. Turbo Vision manual even have whole chapter 6 named “Writing safe programs” — complete with “safety pool”, “LowMemory” condition and so on. It worked.
fork/exec model. It's really elegant… yet really flawed. Once you go that road the only way to efficiently use the whole memory available is via overcommit and once you have overcommit and malloc stops returning NULL and you get SIGSEGV at random time… you can no longer write reliable programs so people just stop writing reliable libraries, too.
Zig heading toward a self-hosting compiler
Zig heading toward a self-hosting compiler
Zig heading toward a self-hosting compiler
Zig heading toward a self-hosting compiler
fork/exec model) made it impossible to reliably handle OOM conditions — people stopped caring.
Zig heading toward a self-hosting compiler
Zig heading toward a self-hosting compiler
Wol
16-bit Windows applications tried to deal with OOM
16-bit Windows applications tried to deal with OOM
16-bit Windows applications tried to deal with OOM
16-bit Windows applications tried to deal with OOM
Zig heading toward a self-hosting compiler
Zig heading toward a self-hosting compiler
