Time To Get Rid Of errno

Posted Aug 21, 2015 0:11 UTC (Fri) by ldo (guest, #40946)
Parent article: Glibc wrappers for (nearly all) Linux system calls

The fundamental problem is that the errno convention has outlived its usefulness. The Linux kernel calls return an error code directly, but to be POSIX-compatible, glibc has to squirrel these away in errno. Which requires all this complicated wrapper code, as well as a whole extra mechanism to make errno thread-safe.

I recently hit the situation where the write(2) call didn’t write all the bytes I gave it to disk, with no error indication in errno. The man page only says this can happen

... if, for example, there is insufficient space on the underlying physical medium, or the RLIMIT_FSIZE resource limit is encountered (see setrlimit(2)), or the call was interrupted by a signal handler after having written less than count bytes.

In other words, you don’t know why it happened.

Time To Get Rid Of errno

Posted Aug 21, 2015 0:16 UTC (Fri) by dlang (guest, #313) [Link]

I think that you'll find that what the kernel returns doesn't tell you why either.

It has nothing to do with errorno, and a lot to do with the fact that there are a LOT of possible things that can cause write() to not write everything out, and the number of possible reasons is going to increase over time. How many thousands of error messages do you want to have to define (and then handle)?

Time To Get Rid Of errno

Posted Aug 21, 2015 0:31 UTC (Fri) by proski (subscriber, #104) [Link] (9 responses)

C functions return one value, it's a language limitation. If two values are returned, one goes to a thread-local variable. Are you suggesting to change the API to use explicit stack variables for additional outputs? Do you have any evidence that it would speed up the software? Introducing a whole new API (POSIX cannot be just removed) would need a very good justification.

Time To Get Rid Of errno

Posted Aug 21, 2015 3:35 UTC (Fri) by wahern (subscriber, #37304) [Link] (8 responses)

True, but C functions can return compound objects.

struct writeret { size_t n; int errno; };
struct writeret writex(int, const void *, size_t);

Modern ABIs would return the values in registers, and an optimizing compiler could elide the existence of an independent struct writeret object altogether. That kernels still only return a single integer value is more about not evolving with the times. Pre-ANSI C didn't permit passing compound objects by value, only pointers, so ABIs and compilers didn't have to consider optimizing that case. In 1989 ANSI C changed that to permit passing structs and unions, but not arrays. For a long time ABIs and compilers would always pass the values on the stack, and it was considerable poor practice to make use of the feature in performance-critical code. But modern ABIs (e.g. AMD64) can pass the member values through registers. So there's no cost to using smallish structs as function parameters or return values.

Time To Get Rid Of errno

Posted Aug 21, 2015 8:36 UTC (Fri) by ehiggs (subscriber, #90713) [Link]

To go along with this change, it would be nice if C could also destructure return values of anonymous structs. e.g.

struct { size_t n; int errno } write(int fd, const void *buf, size_t count);

(n, errno) = write(fd, buf, count);

Time To Get Rid Of errno

Posted Aug 21, 2015 22:04 UTC (Fri) by kleptog (subscriber, #1183) [Link] (5 responses)

> That kernels still only return a single integer value is more about not evolving with the times.

Well, and the fact that you can't just change the userland/kernel interface like that. Promises have been made about what happens to all the registers and there isn't anywhere to put any extra return values in a backward compatible way.

On top of that, even if the kernel could return the info, you can't change the POSIX API either so errno is here to stay.

Time To Get Rid Of errno

Posted Aug 21, 2015 23:23 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Unless you simply skip libc and start calling the syscalls directly, like Go does.

Time To Get Rid Of errno

Posted Aug 22, 2015 9:06 UTC (Sat) by kleptog (subscriber, #1183) [Link] (2 responses)

Ofcourse, if a language wants to skip libc then they can. That just reinforces the fact that the syscall interface can never be changed to return more information.

FWIW I disagree with the OP, it is nice to be able to know that a syscall either succeeded or failed and not some halfway state. If you did a write and the write was short the write still succeeded. POSIX does specify that if the write size is less than PIPE_BUF length then it will succeed or fail atomically. If you have to write your code to handle the case where some data was written but you also have to handle an error code, that just feels more fragile.

All the cases where it would be useful to return more information the specific syscall has made allowances for it, for example recvmsg(). The fact that there are syscalls that are badly designed is a problem with the interface and not the mechanism. I find the ip/tc tools use of netlink here pretty bad, they return EINVAL and you have to hope there's something useful in the kernel log. Would it have killed them to add an extra field for "extended error code"?

Time To Get Rid Of errno

Posted Aug 23, 2015 8:03 UTC (Sun) by lsl (subscriber, #86508) [Link] (1 responses)

> POSIX does specify that if the write size is less than PIPE_BUF length then it will succeed or fail atomically.

Only when actually writing to a pipe, not in the general case.

I just discovered that Linux (since 3.4) implements a Plan-9-style pipe mode where reads from a pipe match up with previous writes (provided the latter weren't greater than PIPE_BUF bytes). See the pipe(2) Linux manpage for the O_DIRECT flag to pipe2. Very nice.

Time To Get Rid Of errno

Posted Aug 23, 2015 10:35 UTC (Sun) by kleptog (subscriber, #1183) [Link]

> I just discovered that Linux (since 3.4) implements a Plan-9-style pipe mode where reads from a pipe match up with previous writes (provided the latter weren't greater than PIPE_BUF bytes). See the pipe(2) Linux manpage for the O_DIRECT flag to pipe2. Very nice.

How is this different to socketpair(AF_UNIX, SOCK_DGRAM) ? The only reason I can think of is that you want it to work on systems without UNIX domain sockets...

Time To Get Rid Of errno

Posted Aug 22, 2015 2:58 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

I'm looking forward to Mill and syscalls being a "portal" call (they act just like a function call and can return multiple values that way).

Time To Get Rid Of errno

Posted Aug 28, 2015 12:44 UTC (Fri) by justincormack (subscriber, #70439) [Link]

Actually NetBSD returns both the values of pipe(2) in registers, so there are two return values, and has done for a very long time. Multiple return values are such a sane nice feature that it is unfortunate that C does not have support.

Time To Get Rid Of errno

Posted Aug 21, 2015 0:35 UTC (Fri) by ncm (guest, #165) [Link] (2 responses)

So, check that return value, and try again if you care. Code that depends on why it failed is typically wrong.

Time To Get Rid Of errno

Posted Aug 21, 2015 1:46 UTC (Fri) by k8to (guest, #15413) [Link] (1 responses)

Aside from EINTR cases and similar?

There are some harder-to-get-right network situations where you want to differentiate between trying again and not trying again, and not differentiating doesn't give you good behavior. But maybe you mean most of the time we screw that up too?

errno, schmerrno.

Posted Aug 21, 2015 2:29 UTC (Fri) by ncm (guest, #165) [Link]

EINTR is a familiar one. And there are some well-understood processes that want a light touch. But overwhelmingly, the right thing to do in the face of failure is to back off, report it, and wait for somebody to ask to try again after they've re-plugged the cable or entered the right address or something. Have everything already set up for a failure before you go in, so you only need to do more if it succeeds, not if it fails. In the real world of system calls there are so many things to go wrong that the odds that whatever clever response you've coded won't make things worse are heavily against you.

Time To Get Rid Of errno

Posted Aug 21, 2015 2:20 UTC (Fri) by deater (subscriber, #11746) [Link]

perf_event_open() is pretty bad about this too, where EINVAL can mean one of scores of possible things. Often you have to resort to sticking printks in the kernel to find out what combination of 40 some paramaters is wrong.

There is work underway though to address this, by improving the syscall error handling. I'm not sure how generic of a solution it is though.
https://lwn.net/Articles/652326/
It will be interesting if that actually gets merged.

Time To Get Rid Of errno

Posted Aug 21, 2015 2:35 UTC (Fri) by gutschke (subscriber, #27910) [Link] (1 responses)

My problem is not so much with errno and short read()/write() calls. That behavior is maybe surprising, but it is well-defined, and it has been well-defined pretty much ever since there was such a thing as UNIX.

I have a completely different issue with "errno". The bulk of the time that I had to make raw system calls has been in extremely low-level code. When the code executes, I can't make much of a guarantee about the execution environment. Quite frequently, there is no such thing as an "errno" variable (e.g. because I just called clone(), and didn't set up thread local storage yet).

This means, I would need libinux to have zero dependencies on any libc code. No accesses to "errno", no accesses to thread local storage, no cancellation points, no locking, no calls to atfork handlers, nothing! But things get even more complicated than that. By default, the dynamic linker lazily resolves library functions. This means, whenever I make a call into libinux, there is a chance that the dynamic loader gets called and makes all sorts of calls that are incompatible with my particular requirements.

In other words, all of libinux would either need to be inline functions, or there needs to be a way to fully resolve its symbols on demand. It is quite possible that my needs are a little unusual, as I have been writing very low-level and Linux specific code. But that's probably something that people will end up wanting to use libinux for.

Other than that, yes, I am fully in favor of libinux giving easy and direct access to all Linux system calls. That feature is long overdue and would be very welcome. I also feel that having wrapper functions that make system calls easier to use is wonderful. I sometimes need the exact raw system calls; and when I do, I am OK with researching the idiosyncrasies of the kernel API and making sure I get things right. But most of the time, I don't need this much control and I actually appreciate having helpers that allow the compiler to make sure I don't do anything stupid.

Time To Get Rid Of errno

Posted Aug 21, 2015 23:39 UTC (Fri) by ncm (guest, #165) [Link]

I can barely imagine dynamic-linking this library. Let the linker fully-resolve it. I agree it shouldn't depend on anything else, but I can't see any reason why it ever would.

Time To Get Rid Of errno

Posted Aug 21, 2015 6:36 UTC (Fri) by epa (subscriber, #39769) [Link]

Crikey! So all the contortion that C programs and the C library have to do to save and keep track of the value of errno (especially for threading, as you noted) is in fact quite unnecessary?

Is there scope for adding non-errno versions of calls to POSIX?

Time To Get Rid Of errno

Posted Aug 21, 2015 7:26 UTC (Fri) by Yorick (guest, #19241) [Link]

The write design may not be wonderful, but the standard procedure upon a short write is to try again with the remainder of the data to be written. Then you will get the real reason in errno (unless it was just a temporary condition the first time).

Time To Get Rid Of errno

Posted Aug 21, 2015 21:02 UTC (Fri) by vapier (guest, #15768) [Link] (2 responses)

the syscall ABI isn't really all that better. if you think using syscall(__NR_xxx) is the solution to all of your problems, then you haven't looked beyond the x86 cpu on your desktop ;).

some functions, like ptrace(), have overlap between valid values and errors. in some cases it returns arbitrary data, so you cannot know whether 0xffffffff is because the data was 0xffffffff or -EPERM (on a 32bit system). you simply have to make assumptions that it's always valid based on other syscalls.

glibc doesn't treat *all* negative values as being errors -- it caps it at different values. on x86, it treats [-1,-4095] (or should it be [-4095,-1] ?) as an errno value. that way you aren't limiting yourself to 31bits, but (2^32 - 4096) possible valid values.

further, the convention for returning errno values isn't consistent across architectures. some (most) will normalize into one register, but a few split it -- at least ia64 & mips do. that way there is no confusion whether there was an error.

further further, some syscalls have to deal with raw C calling conventions. namely, some ABIs (like arm, mips, and ppc) require uint64_t to be split on even/odd pairs. so instead of doing:
syscall(SYS_readahead, fd, (uint32_t)(offset), (uint32_t)(offset >> 32), count)
you have to insert a 0 after the fd by hand:
syscall(SYS_readahead, fd, 0, (uint32_t)(offset), (uint32_t)(offset >> 32), count)

further further further, just because you call a specific syscall by name, it does not mean it's going to be the same across architectures. alpha is a pretty big example of this -- there are many syscalls that don't exist like __NR_getpid. instead they named it __NR_getxpid. they made a lot of decisions so as to be compatible with OSF (after all, surely OSF is more important than this toy "linux" project, and will obivously outlive it). or g'luck trying to do something as simple as mmap -- there's __NR_mmap, __NR_mmap2, and arches are not consistent as to how the offsets are used (maybe they're shifted ?).

the syscall(2) man page has a lot of good discussion in it.

Time To Get Rid Of errno

Posted Aug 22, 2015 13:52 UTC (Sat) by hrw (subscriber, #44826) [Link] (1 responses)

> the syscall ABI isn't really all that better. if you think using syscall(__NR_xxx) is the solution to all of your problems, then you haven't looked beyond the x86 cpu on your desktop ;).

Should I use __NR_stat, __NR_fstat, __NR_fstat64 or maybe __NR_newfstatat? Will my code run properly on all architectures if I use one of them or should I add some #ifdefs for architecture checks?

Even x86 has 3 architectures now (x86, x86-64, x32) which have different set of syscalls defined.

Time To Get Rid Of errno

Posted Sep 23, 2015 3:24 UTC (Wed) by vapier (guest, #15768) [Link]

you should use the symbols provided by the C library. stat() if you have a path string, fstat() if you have a fd, and fstatat() if you have a dir fd.

as for the syscalls you quoted, there are other stat variants (stat64 and fstatat64 at least). there's really no guarantee your code will compile or run properly when you call the syscalls directly. C libraries provide stable APIs/ABIs, including emulating newer functionality when the kernel is old (e.g. the *at syscalls could be emulated in userspace when the kernel was too old by utilizing /proc/self/fd/, but you'd have to call glibc's fstatat and not the kernel's syscall(__NR_newfstatat)).

Time To Get Rid Of errno

Posted Sep 2, 2015 9:12 UTC (Wed) by mirabilos (subscriber, #84359) [Link]

That’s not possible because C has no Carry Flag, and thus no means to signal an error to be different from the return value (BSD uses this: syscalls return positive errnos and CF set; Linux uses negative errnos, but that means that UINT_MAX-4096 is the highest possible return value of a syscall, e.g. limiting file sizes further).