Zig heading toward a self-hosting compiler

Posted Oct 7, 2020 18:52 UTC (Wed) by Cyberax (✭ supporter ✭, #52523)
In reply to: Zig heading toward a self-hosting compiler by khim
Parent article: Zig heading toward a self-hosting compiler

> Rust the language should, in theory, be fine, but Rust's standard library is not really designed for that, which, for all practical purposes makes an event of “running out of memory” very hard to handle
You can't realistically get an "allocation failed" situation in Linux, because of all the overcommit.

So this mostly leaves the small constrained devices. And it's not really relevant there as well. It's very hard to dig yourself out of the OOM hole, so you write the code to avoid getting there in the first place.

Zig heading toward a self-hosting compiler

Posted Oct 7, 2020 20:04 UTC (Wed) by ballombe (subscriber, #9523) [Link] (9 responses)

You can disable overcommit, see /proc/sys/vm/overcommit_memory

Zig heading toward a self-hosting compiler

Posted Oct 7, 2020 20:07 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

It doesn't actually disable it. You still will get killed by the OOM killer rather than get null from malloc(). In my experience to force malloc() on Linux to return NULL, you need to disable overcommit and try a really large allocation.

Zig heading toward a self-hosting compiler

Posted Oct 9, 2020 13:17 UTC (Fri) by zlynx (guest, #2285) [Link] (7 responses)

I don't think that you set it correctly then, because strict commit definitely works. I run my servers that way.

You have to read the documentation pretty carefully because there's actually three modes: 0 for heuristic, 1 for overcommit anything, and 2 is strict commit (well, strict depending on the overcommit_ratio value).

Zig heading toward a self-hosting compiler

Posted Oct 9, 2020 17:40 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

Strict commit works, sure. In the sense that the OOM killer will come out immediately, rather than later.

As I've shown, there's simply no way to get -ENOMEM out of sbrk() as an example.

Zig heading toward a self-hosting compiler

Posted Oct 9, 2020 18:25 UTC (Fri) by zlynx (guest, #2285) [Link] (5 responses)

And yet, it does do it somehow. I just wrote a little C program to test it, and tried it on my laptop and one of my servers.

#include <assert.h>
#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

intptr_t arg_to_size(const char *arg) {
  assert(sizeof(intptr_t) == sizeof(long));

  errno = 0;
  char *endp;
  long result = strtol(arg, &endp, 0);
  if (errno) {
    perror("strtol");
    exit(EXIT_FAILURE);
  }
  if (*endp != '\0') {
    switch (*endp) {
    default:
      exit(EXIT_FAILURE);
      break;
    case 'k':
      result *= 1024;
      break;
    case 'm':
      result *= 1024 * 1024;
      break;
    case 'g':
      result *= 1024 * 1024 * 1024;
      break;
    }
  }
  return result;
}

int main(int argc, char *argv[]) {
  if (argc < 2)
    exit(EXIT_FAILURE);
  intptr_t inc = arg_to_size(argv[1]);
  if (inc < 0)
    exit(EXIT_FAILURE);

  printf("allocating 0x%lx bytes\n", (long)inc);
  void *prev = sbrk(inc);
  if (prev == (void *)(-1)) {
    perror("sbrk");
    exit(EXIT_FAILURE);
  }

  return EXIT_SUCCESS;
}

On a 32 GiB server with strict overcommit:

$ ./sbrk-large 24g
allocating 0x600000000 bytes

$ ./sbrk-large 28g
allocating 0x700000000 bytes
sbrk: Cannot allocate memory

Here are the interesting bits from the strace on the strict commit server for ./sbrk-large 32g. You can see sbrk is emulated by getting the current brk, adding the sbrk increment to it. Then it sees that brk did not move and returns an error code.

brk(NULL)                               = 0x1d71000
brk(0x801d71000)                        = 0x1d71000

And on the laptop after turning on full overcommit. Heuristic was failing on big numbers but with overcommit_memory set to 1 no problems.

./sbrk-large 64g
allocating 0x1000000000 bytes

Zig heading toward a self-hosting compiler

Posted Oct 9, 2020 18:28 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

Try allocating in small increments, instead of a huge allocation that blows past the VMA borders.

Zig heading toward a self-hosting compiler

Posted Oct 9, 2020 19:14 UTC (Fri) by zlynx (guest, #2285) [Link] (3 responses)

With sbrk it won't make any difference. It's a single contiguous memory block.

I'm not even writing into it. It's the writing that triggers OOM. The Linux OOM system is happy to let you have as much virtual memory as you want as long as you don't use it.

But as you can see when I exceed the amount of available RAM (free -g says there's 27g available) in a single allocation on the server with strict overcommit it fails immediately.

Zig heading toward a self-hosting compiler

Posted Oct 11, 2020 17:47 UTC (Sun) by epa (subscriber, #39769) [Link] (2 responses)

It's the writing that triggers OOM.

Isn't that exactly the point? If the memory isn't actually available, the allocation appears to succeed, but then blows up when you try to use it. There is not a way to say "please allocate some memory, and I do intend to use it, so if we're out of RAM tell me now (I'll cope), and if not, please stick to your promise that the memory exists and can be used".

It's good that a single massive allocation returns failure, but that does not come close to having a reliable failure mode in all cases.

Zig heading toward a self-hosting compiler

Posted Oct 11, 2020 18:29 UTC (Sun) by zlynx (guest, #2285) [Link] (1 responses)

With strict commit any allocation that succeeds is guaranteed to be available. You won't get the OOM handler killing anything when the memory is used. That's why I run my servers that way. Server applications tend to be built to handle memory allocation failures.

Unless it's Redis. You have to run Redis with full overcommit enabled.

Zig heading toward a self-hosting compiler

Posted Oct 18, 2020 15:06 UTC (Sun) by epa (subscriber, #39769) [Link]

Thanks, sorry I misunderstood your earlier comment.

Zig heading toward a self-hosting compiler

Posted Oct 8, 2020 9:46 UTC (Thu) by khim (subscriber, #9252) [Link] (3 responses)

> It's very hard to dig yourself out of the OOM hole, so you write the code to avoid getting there in the first place.

Practically speaking you end up in a situation where you need to hit the reset switch (or wait for the watchdog to kill you), anyway.

This may be an Ok approach for the smartphone or even your PC. But IoT with this approach is a disaster waiting to happen (read about Beresheet to know how it works, ultimately).

So yeah, Zig is "worth watching". Let's see how it would work.

Zig heading toward a self-hosting compiler

Posted Oct 8, 2020 11:23 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

IME, having worked in the sort of environment where you can't OOM safely, you don't actually care that much about catching allocation failures at the point of allocation; the Rust approach of unwind to an exception handle via catch_unwind is good enough for allocation failures.

The harder problem is to spend a lot of effort bounding your memory use at compile time, allowing for things like fragmentation. Rust isn't quite there yet (notably I can't use per-collection allocation pools to reduce the impact of fragmentation).

Zig heading toward a self-hosting compiler

Posted Oct 8, 2020 15:04 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

Yup. And as I wrote right in the initial post (just maybe wasn't clear enough): it's not even question of language design — but more features of it's standard library.

Both Rust and C++ should, in theory, support design for limited memory. Both have standard libraries which assume that memory is endless and, if we ever run out of memory then it's Ok to crash. And now, with C++20, C++ have finally got language constructs which deliver significant functionality, not easily achievable by other methods — yet rely on that “memory is endless and if it ever runs out then it's Ok to crash” assumption.

So Zig is definitely covering unique niche which is not insignificant. But only time will tell if it's large enough to sustain it.

Zig heading toward a self-hosting compiler

Posted Oct 11, 2020 17:52 UTC (Sun) by epa (subscriber, #39769) [Link]

It's unfashionable to write programs with fixed size buffers or arbitrary limits, but I think that would often be a way to get better reliability in more "static" applications where the workload is known in advance. Of course, you need to fail gracefully when the buffer is full or the limit is reached -- but you can write test cases for that, certainly a lot more easily than you can have test cases for running out of memory at every single dynamic allocation in the codebase, or even worse, being OOM killed at any arbitrary point.

Zig heading toward a self-hosting compiler

Posted Oct 9, 2020 3:39 UTC (Fri) by alkbyby (subscriber, #61687) [Link] (1 responses)

not entirely true. Programs may and sometimes do have their own limits of total malloced sized. And it is very useful sometimes. And you already posted below that larger allocations can fail (but not entirely correctly btw; actually even with default overcommit larger allocations or forks may fail)

Zig heading toward a self-hosting compiler

Posted Oct 9, 2020 3:58 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Some years ago I did try to test my program for OOM robustness. It was a regular C++ program compiled for glibc. I was not actually able to get allocations to fail! Instead the OOM killer usually just murdered something unrelated.

Out of curiosity I decided to look how allocators are implemented. I don't want to wade through glibc source code, so I looked in Musl. The allocator there uses the good old set_brk syscall to expand the heap (and direct mmap for large allocations).

Yet the sbrk() source code in Linux does _not_ support ENOMEM return: https://elixir.bootlin.com/linux/latest/source/mm/mmap.c#... Even if you lock the process into the RAM via mlockall(MCL_FUTURE), sbrk() will simply run infallible mm_populate() that will cause the OOM killer to awake if it's out of RAM.

You certainly can inject failures by writing your own allocator, but for regular glibc/musl based code it's simply not going to happen.