Two new ways to read a file quickly

Posted Mar 18, 2020 10:10 UTC (Wed) by farnz (subscriber, #17727)
In reply to: Two new ways to read a file quickly by nix
Parent article: Two new ways to read a file quickly

Except that for such a potential world to be worth considering, you need to explain how it's plausible.

The "fast syscall optimization" in Solaris on SPARC used the fact that SPARC has 128 syscall entry points in the hardware to optimize up to 128 syscalls - that's over a third of Linux syscalls, more if you ignore all the legacy syscalls (as Solaris could, since it could do the translation from legacy to current in libc). It only had such a drastic effect in Solaris since the "fast" syscalls didn't make use of the generic ABI translation at syscall entry that Solaris chose to do to simplify syscall implementation - in other words, it worked around a known software deficiency in Solaris, stemming from their desire to use the same SunStudio compiler and ABI for all code, rather than teaching SunStudio to have a syscall ABI for kernel code to use.

The vDSO isn't about syscalls per-se; the vDSO page is a userspace page that happens to be shared with the kernel, and contain userspace code and data from the kernel, allowing you to completely avoid making a syscall.

Remember that, at heart, syscalls are four machine micro-operations sequenced sensibly; everything else is built on top of this:

Save the current privilege level, so that you can restore it on return.
Save the next PC so that you can return back here.
Set the current privilege level.
Set PC to a syscall entry point.

Any optimization in hardware that leads to a subset of syscalls being faster has to be in the last micro-operation; all the others are common to all syscalls. The only such optimization that's possible is to have alternate syscall entry points for different syscalls; this is what the SPARC trap system does, using a 128 entry trap table to decide which syscall entry point to use.

Note, too, that the tendency over time is to optimize the hardware with a single syscall entry point, since that's just a single pointer-sized piece of data to track; Intel 8008 through to 80286 only had INT for syscalls, 80386 added call gates, while Pentium II added SYSENTER which only has a single possible entry point. Similarly, ARM, MIPS, POWER, PowerPC, RISC-V, and AArch64 all only have a single instruction to do syscalls that goes to a single syscall entry point (albeit that on POWER, PowerPC, ARM, and AArch64, that instruction also includes a limited amount of data that's supplied to the kernel, intended for use as a syscall number).

SPARC is the one exception to the rule that more modern architectures only have a single syscall entry point, with its trap table of 128 entries, and even then, it was only a performance win because Solaris was able to use the trap table to get around its own earlier bad decisions around syscall handling.