| Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today! |
The GNU C library (glibc) 2.25 release is expected to be available at the beginning of February; among the new features in this release will be a wrapper for the Linux getrandom() system call. One might well wonder why getrandom() is only appearing in this release, given that kernel support arrived with the 3.17 release in 2014 and that the glibc project is supposed to be more receptive to new features these days. A look at the history of this particular change highlights some of the reasons why getting new features into glibc is still hard.
Glibc remains a conservative project. There are a number of good reasons for that, but it does mean that developers proposing new features tend to run into roadblocks; that has certainly happened with getrandom(). The kernel's random number subsystem maintainer, Ted Ts'o, has been known to complain about the delay in support for this system call; he has suggested that "maybe the kernel developers should support a libinux.a library that would allow us to bypass glibc when they are being non-helpful." Peter Gutmann resorted to channeling Sir Humphrey Appleby when describing the glibc project's approach to getrandom(). But what really caused the delay here?
Glibc bug 17252, requesting the addition of getrandom(), was filed in August 2014, five days after the 3.17 kernel release. Glibc developer Joseph Myers responded twice in the following six months, suggesting that, if anybody wanted getrandom() in glibc, they would need to go onto the project's mailing list and work to drive the development forward. The first reason for the delay is thus simple: nobody stepped up to do the work.
One might wonder why it took so long for somebody to come along and implement a simple system-call wrapper. In its essence, the code that will appear in the 2.25 release is:
/* Write LENGTH bytes of randomness starting at BUFFER. Return 0 on
success and -1 on failure. */
ssize_t
getrandom (void *buffer, size_t length, unsigned int flags)
{
return SYSCALL_CANCEL (getrandom, buffer, length, flags);
}
Such a function does not seem particularly hard to write. The original patch for getrandom() support, finally posted by Florian Weimer in June 2016, was rather more complicated than that, though. Weimer, knowing that the glibc project is conservative and wants the library to work in almost all situations, attempted to cover every base he could think of. So the patch included documentation updates, test programs, and several other details that, in turn, led to a number of sticking points that surely slowed the eventual acceptance of the patch.
The first obstacle, though, had little to do with the patch itself; it was, instead, brought about by the project's reluctance to add wrappers for Linux-specific system calls at all. Glibc does not see itself as a Linux-specific project, so it naturally prefers standardized interfaces that can be supported on all systems. The project has sporadically discussed its policy around Linux-specific calls over the last couple of years. In 2015, Myers described it as:
A draft policy for Linux-specific wrappers has existed since about then but, lacking consensus in a strongly consensus-oriented project, it has never achieved any sort of official status. Thus, even though this policy states that system-call wrappers should be added by default in the absence of reasons to the contrary, Roland McGrath responded to the initial patch posting with a terse message saying: "You need to start with rationale justifying the new nonstandard API and why it belongs in libc." That justification was not hard, given that a number of projects have been asking for this wrapper, and that adding the BSD getentropy() interface on top of it is easily done, but this challenge foreshadowed much of what was to come.
A trickier question was: what should glibc do when running on pre-3.17 kernels (or non-Linux kernels) that lack getrandom() support? The initial patch included a set of emulation functions so that getrandom() calls would always work; they would read the data from /dev/random or /dev/urandom as appropriate. Doing so involved keeping open file descriptors to those devices (lest later calls fail if the application does a chroot()). But using file descriptors in libraries is always fraught with perils; applications may have their own ideas of which descriptors are available, or may simply run a loop closing all descriptors. So the code took pains to use high-numbered descriptors that applications presumably don't care about, and it used fstat() to ensure that the application had not closed and reopened its descriptors between calls.
This usage of file descriptors drew a number of comments; it is something that glibc tries to avoid whenever possible. After some discussion, it was concluded that glibc should provide only a wrapper for the system call, without emulation. If an application calls getrandom() on a kernel where that system call is not supported, the glibc wrapper will simply return ENOSYS and it will be up to the application to use a fallback. That decision removed a fair amount of code and one obstacle to merging.
In writing the patch, Weimer worried that there may be a number of applications out there with their own function called getrandom(), which may or may not provide the same interface and semantics as the glibc version. The prospect was especially troubling because a getrandom() call that does not actually return random data may not cause any visible problems in the application at all — until some attacker notices this behavior and exploits it. So he employed a bunch of macro and symbol-versioning trickery to detect and prevent confusion over which getrandom() function to use.
This feature, too, was unpopular; glibc does not normally add extra layers of protection around its symbols in this way. The tricks made it impossible to take the address of the function, among other things. After extensive discussion, Weimer backed down and removed the interposition protection, but he clearly was not entirely happy about it.
The most extensive argument, though, was over whether getrandom() should be a thread cancellation point. In other words, what should happen if pthread_cancel() is called on a thread that is currently blocked in getrandom()? The original patch did make getrandom() into a cancellation point; it still behaves that way in the version merged for 2.25, but it had to survive a lot of argument to get there.
Weimer wanted getrandom() to be a cancellation point because the system call can block indefinitely, even if it almost never blocks at all. The Python os.urandom() episode showed that this blocking can, in rare situations, cause real problems. So, he said, it should be possible for a cancellation-aware program to respond to an overly slow getrandom() call.
The objections here seemed to be, for the most part, objections to cancellation points in general. It is true that cancellation points are problematic in a number of ways. To the implementation issues one can add the fact that most programs are not cancellation-aware and may not respond well to a thread cancellation in an unexpected place. A version of getrandom() that adds a new cancellation point could thus lead to unfortunate behavior. Additionally, getrandom() is supposed to always succeed; the possibility of cancellation adds a failure mode that is not a part of the system call itself.
On the other hand, Carlos O'Donell argued that getrandom() is analogous to read() and thus should behave the same way; read() is a cancellation point. The argument went back and forth over months, and included detours into whether there should be a separate getrandom_nocancel() function or an additional "cancellation point please" argument to getrandom(). In the end, getrandom() remained an unconditional cancellation point. The BSD-compatible getentropy() implementation included in the patch is not a cancellation point, though.
With these issues resolved, the conversation came to a close on December 12 when getrandom() and getentropy() were merged into the glibc repository. A feature that has been shipping in the Linux kernel for over two years will finally be available to application developers without the need to create special system-call wrappers. Now all that's left is all the other Linux-specific system calls that still lack glibc wrappers.
The long road to getrandom() in glibc
Posted Jan 9, 2017 23:06 UTC (Mon) by quotemstr (subscriber, #45331) [Link]
FWIW, Windows programmers have no qualms about keeping private HANDLEs around despite the ability under Windows to enumerate HANDLE values, do the equivalent of dup2, and so on.
The long road to getrandom() in glibc
Posted Jan 9, 2017 23:51 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 0:57 UTC (Tue) by quotemstr (subscriber, #45331) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 20:53 UTC (Tue) by lsl (subscriber, #86508) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 22:08 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]
And perhaps a thread/process-wide default.
The long road to getrandom() in glibc
Posted Jan 13, 2017 13:03 UTC (Fri) by alonz (subscriber, #815) [Link]
For even cleaner semantics, make all such new-space FDs always behave as if O_CLOEXEC was set. You then magically get rid of (all? most?) arguments against FD randomization.
The long road to getrandom() in glibc
Posted Jan 13, 2017 19:59 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]
The long road to getrandom() in glibc
Posted Jan 13, 2017 21:20 UTC (Fri) by andresfreund (subscriber, #69562) [Link]
The long road to getrandom() in glibc
Posted Feb 14, 2017 18:53 UTC (Tue) by nix (subscriber, #2304) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 8:04 UTC (Tue) by vstinner (subscriber, #42675) [Link]
"I've seen an issue with using urandom on Python 3.4. I've traced down to fd being closed (not by core CPython, but by third party library code). After this, access to urandom fails. (....) OSError: [Errno 9] Bad file descriptor"
The workaround is to call fstat() and store st_dev and st_ino to check if the FD was *closed or replaced*. If the FD was replaced, os.urandom() leaves the FD open because "it probably points to something important for some third-party code" and open a new FD... Not ideal, but "it works"...
Getting random bytes from the OS in a portable way takes around 600 lines of code: https://github.com/python/cpython/blob/master/Python/rand... !
The long road to getrandom() in glibc
Posted Jan 10, 2017 8:39 UTC (Tue) by vstinner (subscriber, #42675) [Link]
Hum, I forgot to explain why, it's also an interesting story. With a lot of threads and high system load, the Python os.urandom() function failed with the NotImplementedError("/dev/urandom ...") exception:
http://bugs.python.org/issue18756
The C code considered that the device is not available if open("/dev/urandom", O_RDONLY) fails with an error. There was no specific case for EMFILE or ENFILE error. The private FD was added to use at most one file descriptor.
Note: Java also keeps one persistent FD to /dev/urandom.
The long road to getrandom() in glibc
Posted Jan 10, 2017 13:28 UTC (Tue) by quotemstr (subscriber, #45331) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 14:36 UTC (Tue) by pizza (subscriber, #46) [Link]
Meanwhile, in the real world, the programs dictate the choice of your system, not the other way around.
The long road to getrandom() in glibc
Posted Jan 10, 2017 16:14 UTC (Tue) by quotemstr (subscriber, #45331) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 14:53 UTC (Tue) by nix (subscriber, #2304) [Link]
Programs that do clearly broken things like dereferencing NULL pointers are freely broken (unlike on, say, Solaris or God forbid hpux) -- but programs that do things that Unix programs have been doing for decades in very large numbers can't just be broken by fiat like that, even if they are objectively horrible things to do.
The long road to getrandom() in glibc
Posted Jan 10, 2017 16:16 UTC (Tue) by quotemstr (subscriber, #45331) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 16:51 UTC (Tue) by excors (subscriber, #95769) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 17:07 UTC (Tue) by quotemstr (subscriber, #45331) [Link]
Programs that access resources that they do not own are broken and need to be fixed no matter how painful it may be. We have symbol versioning and such to preserve old, broken behavior for old, broken programs, and the versioning approach will continue to work for private file descriptors. Programs compiled these days need to be modified so that they don't free resources that they don't own.
The long road to getrandom() in glibc
Posted Jan 10, 2017 18:15 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]
> If libraries cannot have private file descriptors
They cannot, not safely.
> SysV-like ad-hoc handles for resources instead of file descriptors? Some kind of alternate file descriptor namespace?
The same descriptor namespace, but growing down from some large value (MAXINT?) and not having all the brain-deadness associated with regular descriptors.
The long road to getrandom() in glibc
Posted Jan 11, 2017 14:53 UTC (Wed) by mirabilos (subscriber, #84359) [Link]
The long road to getrandom() in glibc
Posted Jan 21, 2017 4:32 UTC (Sat) by njs (guest, #40338) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 10:13 UTC (Tue) by smcv (subscriber, #53363) [Link]
If you fork-and-exec, you still have to coordinate with the other libraries in the same process to make sure they use O_CLOEXEC, SOCK_CLOEXEC, FD_CLOEXEC, etc. on every fd (in practice not necessarily feasible if you use lots of libraries); or iterate through fds between fork and exec to close them all, except for a whitelist of deliberately-inherited fds. Otherwise, your activated D-Bus service inherits "private" fds from dbus-daemon's use of libselinux, for example.
(I maintain libdbus and contribute to GLib, both of which: want to use both private fds and fork/exec; have had bugs where their private fds were not close-on-exec despite going to some effort to add the right CLOEXEC flags everywhere; and aim to be portable to platforms that don't have CLOEXEC, so have to have fallback code to use FD_CLOEXEC every time they open a fd anyway.)
The long road to getrandom() in glibc
Posted Jan 10, 2017 13:33 UTC (Tue) by quotemstr (subscriber, #45331) [Link]
If you want to be portable, sure, close FDs in the child before exec. You'll be the only thread running, and you'll be running only async-signal-safe code anyway, so what's the problem?
The long road to getrandom() in glibc
Posted Jan 9, 2017 23:14 UTC (Mon) by nix (subscriber, #2304) [Link]
Florian is a tower of strength without which glibc would be much less secure than it is now: almost all the release-to-release security improvements in glibc have his imprint on them somewhere. (Joseph is a similar power in the land with regard to making libm work better.)
The long road to getrandom() in glibc
Posted Jan 9, 2017 23:43 UTC (Mon) by PaXTeam (guest, #24616) [Link]
The long road to getrandom() in glibc
Posted Jan 9, 2017 23:56 UTC (Mon) by ay (subscriber, #79347) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 4:19 UTC (Tue) by thestinger (subscriber, #91827) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 11:38 UTC (Tue) by nix (subscriber, #2304) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 17:09 UTC (Tue) by thestinger (subscriber, #91827) [Link]
SafeStack is ready to be applied to entire distributions. HardenedBSD is using it globally and Google is working on integrating it into Android (it's mostly stalled on bikeshedding at the moment). It separates the stack into one with all of the return pointers, register spills and safe data (no overflows or address leaks) and keeps the data where overflows can occur separately. It barely has a performance hit, and if you really want to you can still use SSP with it. Linear overflows can't get to the safe stack as long as there are guard pages, so it's a deterministic defence against that with probabilistic mitigation of arbitrary writes via ASLR / libc stack randomization (reserve random runs of guard pages on either side - and with typical 8M stacks, there's nearly zero cost since they already span multiple 2M regions anyway). It's too bad that there aren't hardware features available for a great implementation anymore (i.e. segmentation).
Clang's CFI implementation is similarly ready for deployment and a subset of it (C++ virtual method calls) is being used by Chrome's 64-bit Linux builds (distributions can use it too but they typically don't care about stuff like this) already with the type cast checking on the horizon. It's annoying though, since it depends on LTO and requires fixing a bunch of undefined indirect calls. It only protects indirect calls rather than also covering returns (performance / size would be a major issue for returns). LTO is risky since it makes all of the latent undefined behavior in real world code significantly more dangerous and we don't have good tools to catch it. UBSan and ASan are able to catch subset but are missing tons of coverage, and only catch it when it occurs at runtime which may not happen for UB resulting in vulnerabilities in edge cases unless you use trapping UBSan which is costly and can be painful to debug but is production ready at least in Clang.
The long road to getrandom() in glibc
Posted Jan 13, 2017 18:04 UTC (Fri) by nix (subscriber, #2304) [Link]
-fstack-protector-strong is better than nothing, but there are tons of memory corruption vulnerabilities and the most common ones are now heap overflows and use-after-free, not stack overflows and particularly not linear stack overflowsWell yeah, but there was no point my implementing a fix for that because Florian's already working on one.
It specifically protected me against CVE-2015-7547. That one made headlines and was a remote exploit. Obviously it doesn't protect you against everything: why on earth would anyone assume that it would?
The long road to getrandom() in glibc
Posted Feb 14, 2017 19:04 UTC (Tue) by nix (subscriber, #2304) [Link]
SafeStack is ready to be applied to entire distributions.SafeStack is really cool, but by its very nature splitting the stack in two is a great big ABI break, requiring coordinated rebuilds of literally everything. There's a reason Android and the BSDs are doing it first -- as integrated systems, that sort of big cross-project change is much easier for them. (Also, they frankly don't have as much weird edge-case software doing strange things as Linux does. Most of that software is quite unimportant, I suppose.)
The long road to getrandom() in glibc
Posted Jan 10, 2017 21:12 UTC (Tue) by PaXTeam (guest, #24616) [Link]
replacing speculation with real data, ssp-all has an overhead of >25% on a workload where RAP has <5%. should i ask Intel for a refund since my CPU doesn't seem to know or care about those 'pipeline stalls'? ;)
> probably of per-ELF-object random objects filled in by the kernel a-la OpenBSD,
just for the record, that idea doesn't originate from OpenBSD but from a hardened gentoo discussion back in around 2003-2004 or so (maybe even earlier).
> which will hopefully let us stack-protect the one unprotected piece, ld.so.
i've had no problems protecting glibc (ld.so included) with RAP, so i guess you're just running into (and not fixing) the implemention mistakes of SSP.
> Obviously there are better approaches, but are there any ready to apply to entire distros? No.
i recompiled a gentoo system with enough packages to run chromium and had no problems with RAP's XOR cookie approach whatsoever.
> I'm merely extending protections that are already there for the entire distro bar glibc to glibc as well.)
IMHO your time would be much better spent on fixing glibc's abuse of function pointers as there's lots of horror and actual bugs to be found there...
The long road to getrandom() in glibc
Posted Jan 11, 2017 16:33 UTC (Wed) by clump (subscriber, #27801) [Link]
The long road to getrandom() in glibc
Posted Jan 11, 2017 17:10 UTC (Wed) by PaXTeam (guest, #24616) [Link]
The long road to getrandom() in glibc
Posted Jan 11, 2017 18:12 UTC (Wed) by pizza (subscriber, #46) [Link]
Methinks you would do well to follow your own advice.
The long road to getrandom() in glibc
Posted Jan 11, 2017 18:23 UTC (Wed) by PaXTeam (guest, #24616) [Link]
The long road to getrandom() in glibc
Posted Jan 13, 2017 18:08 UTC (Fri) by nix (subscriber, #2304) [Link]
The long road to getrandom() in glibc
Posted Jan 14, 2017 20:43 UTC (Sat) by jospoortvliet (subscriber, #33164) [Link]
The long road to getrandom() in glibc
Posted Jan 15, 2017 0:53 UTC (Sun) by spender (subscriber, #23067) [Link]
"replacing speculation with real data, ssp-all has an overhead of >25% on a workload where RAP has <5%. should i ask Intel for a refund since my CPU doesn't seem to know or care about those 'pipeline stalls'? ;)"
That is literally the only thing he said that could potentially be construed as "impolite" -- personally I find it impolite to be spreading false information or making bogus claims with nothing at all to back it up, especially when you're talking to the person who *has already done the work and knows you are wrong*.
Then we have nix's reply (he often likes to get the last word in via tone argument when he's lost the technical argument, as he always does), where he says: "in the absence of actual facts rather than venom" aka exactly the phrasing the PaX Team used to inspire these stupid comments (followed by attacking a strawman to make him look smart, unfortunately it has nothing to do with what the PaX Team was referring to re: abuse of function pointers. Thinking for 10 seconds about what kind of function pointer abuse one would learn of after having developed a CFI system that essentially enforces type correctness for indirect function calls might clue a person in to what was being talked about, especially when this work has been applied to glibc, but not nix -- his focus is pointless kneejerk responses to make himself feel better). In reply, someone calls his response "graceful". Give me a break.
What's disrespectful, impolite, and condescending is everyone's pointless tone arguments and spreading of false information. If you don't know what you're talking about, the proper way to respond to someone is to ask a question and learn something, not to pose as an expert in something you're clearly not and waste time arguing with someone who knows better. I doubt you will ever find us being "impolite" to anyone asking legitimate questions who are trying to learn. People who aren't here to learn and are just misinforming others with their own ignorance are the problem. And we all know who these people are, because they show up in nearly every thread on this site, and surprise, they think they're an expert in every topic here. Put your collective big boy pants on, and quit it with these pointless replies.
-Brad
The long road to getrandom() in glibc
Posted Feb 14, 2017 18:43 UTC (Tue) by nix (subscriber, #2304) [Link]
(back after several weeks away because spender angered me enough that it poisoned my view of the whole site for some time)he often likes to get the last word in via tone argument when he's lost the technical argument, as he always does)See, that's the thing, isn't it? You consider this an 'argument' that you have to win and be seen to be right, no matter the consequences. I see this as a cooperative venture, all together. The two worldviews are incompatible. I think you're repulsive and that complaining about your tone is crucial because it makes everything you do less useful to everyone: you think I'm ignorant and that I'm complaining about your tone as some sort of get-out because I can't attack your content. (I don't want to attack your content, since there's nothing wrong with it when you actually bother to back up your arguments rather than just posting MD5 hashes as proof that you knew something before anyone else, or randomly insulting people who don't treat everything you say as gospel truth. I have never claimed not to be ignorant: of course I don't know everything about everything, and of course not everything I say is always right. That's true of everyone who hasn't taken a vow of silence. Except, it appears, for you, or so you appear to believe.)
Put your collective big boy pants on, and quit it with these pointless replies.Maybe you should tell PaXTeam that, since he is the very one who responded to my comment by casting aspersions on my work but leaving it up to everyone else to do the necessary research to tell what he was talking about. (Said comment was intended to compliment the glibc developers, not toot my own horn, not that you appeared to grasp that; it was not addressed to either of you, and in fact I'd be very glad if you killfiled me). That sort of response seems to me to be the very definition of a pointless reply: actually worse than pointless, since it's an attack without the evidence to back it up, on a site whose comment threads used to be known for their civility as well as their utility.
But, of course, that was before you two came along.
The long road to getrandom() in glibc
Posted Jan 13, 2017 18:07 UTC (Fri) by nix (subscriber, #2304) [Link]
e.g. his flaming about glibc's function pointer use, for instance, well, in the absence of actual facts rather than venom it's hard to tell what he's talking about, since glibc's function pointers have been XORed with a random cookie for a very long time now. I guess he's talking about a few remaining unrandomized libio pointers in the FILE * -- and, y'know, if we didn't give a damn about compatibility we could randomize those too. Unfortunately, we do, and randomizing them breaks real existing applications.
The long road to getrandom() in glibc
Posted Jan 13, 2017 19:02 UTC (Fri) by clump (subscriber, #27801) [Link]
The long road to getrandom() in glibc
Posted Jan 9, 2017 23:44 UTC (Mon) by karkhaz (subscriber, #99844) [Link]
Perhaps a dumb question, but why not surround the function in IFDEFs so that it doesn't get compiled in on such kernels? So that clients building for other systems get a compilation error rather than a runtime error that they might not even be checking for.
The long road to getrandom() in glibc
Posted Jan 9, 2017 23:47 UTC (Mon) by corbet (editor, #1) [Link]
The function is indeed stubbed out on platforms where the system call cannot exist. But the kernel the library is compiled under is not necessarily the kernel on which it will run at any given time; the two components are not that tightly tied together.
The long road to getrandom() in glibc
Posted Jan 9, 2017 23:55 UTC (Mon) by ay (subscriber, #79347) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 2:48 UTC (Tue) by ncm (subscriber, #165) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 11:01 UTC (Tue) by josh (subscriber, #17465) [Link]
Please don't ever do that.
Version detection would break, for instance, if the kernel configuration compiles out the system call to save space, supported for an increasing number of system calls (and hopefully all of them eventually). It would also break if someone backported the system call. Or if the kernel wired up the system call in different versions for different architectures or ABIs.
Always call the syscall, check for ENOSYS, and fall back or error out as appropriate.
The long road to getrandom() in glibc
Posted Jan 10, 2017 15:10 UTC (Tue) by ay (subscriber, #79347) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 17:25 UTC (Tue) by zlynx (subscriber, #2285) [Link]
Of course, _your_ code will never do it wrong. But _someone's_ will and then the OS will need yet another lame hack to report version 9.99.
As an example from another OS, Microsoft has got tired enough of these version check bugs that they've made getting the actual OS version quite difficult. The regular version check only returns the minimum of the OS or the program manifest so that software built for Windows 10 will always return version 10 even on Windows 15. And if there isn't a manifest it gets Windows 8.1.
The long road to getrandom() in glibc
Posted Jan 12, 2017 3:12 UTC (Thu) by cjwatson (subscriber, #7322) [Link]
The long road to getrandom() in glibc
Posted Jan 21, 2017 4:42 UTC (Sat) by njs (guest, #40338) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 4:04 UTC (Tue) by busterb (subscriber, #560) [Link]
That should make glibc itself not work on kernels too old to support the syscall, removing the need for ENOSYS or backward compatibility shims. Then from an application point-of-view, the wrapper either exists, or the code doesn't run in the first place.
That's not a lot different than glibc 2.24 requiring kernel 2.6.32 or later on x86 (others?). Or is this also an optional syscall even on newer kernels?
Kernel and libc versions
Posted Jan 10, 2017 8:11 UTC (Tue) by vstinner (subscriber, #42675) [Link]
If the builder is too old, the program lacks new features (ex: don't try to use getrandom()). If the builder is too recent, the program tries to use too recent function which fails with ENOSYS (or differently, sometimes in sublte ways, see below).
Python is full of runtime checks for recent Linux kernel features: open(O_CLOEXEC), socket(SOCK_CLOEXEC), getrandom(), etc.
For open(O_CLOEXEC), the check is not as simple as ENOSYS or EINVAL. On older kernels, the flag was simply ignored! Python has to check on the first open() call if the flag was correctly set. Otherwise, it remembers that the flag is ignored and sets the flag in a second syscall (ioctl or fcntl, again depending on the availability of the ioctl or not).
The long road to getrandom() in glibc
Posted Jan 10, 2017 11:51 UTC (Tue) by nix (subscriber, #2304) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 5:55 UTC (Tue) by eru (subscriber, #2753) [Link]
the project's reluctance to add wrappers for Linux-specific system calls at all.Is glibc really being used on non-Linux systems in practice? BSD's seem to prefer their own BSD-licensed libc. I guess there is Cygwin, but it has to create emulations for most of Linux calls anyway, so it could add getrandom() itself.
The long road to getrandom() in glibc
Posted Jan 10, 2017 6:08 UTC (Tue) by pabs (subscriber, #43278) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 11:51 UTC (Tue) by nix (subscriber, #2304) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 7:35 UTC (Tue) by jaromil (guest, #97970) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 11:54 UTC (Tue) by nix (subscriber, #2304) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 11:55 UTC (Tue) by nix (subscriber, #2304) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 13:19 UTC (Tue) by mm7323 (subscriber, #87386) [Link]
You can create cancellation points anywhere with void pthread_testcancel(void). The harder thing is stopping cancellation in a library (your can only really defer it with pthread_setcancelstate()).
The long road to getrandom() in glibc
Posted Jan 10, 2017 15:54 UTC (Tue) by carlos.odonell (subscriber, #99737) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 22:08 UTC (Tue) by mm7323 (subscriber, #87386) [Link]
You cannot easily create arbitrary deferred cancellation points inside a syscallI think the original question was about creating cancellation points in a library. It can be done.
That said, thread cancellation is very messy, non-obvious in the code and prone to resource leaks, corruption and race conditions. Except in the very simplest of cases, I would advise anyone considering trying to use thread cancellation to find a more reliable method.
Python 3.6, glibc 2.25, getentropy() and kernel < 3.17
Posted Jan 10, 2017 8:24 UTC (Tue) by vstinner (subscriber, #42675) [Link]
When I wrote Python/random.c, I added support for OpenBSD getentropy(). On OpenBSD, packages are build on the same OpenBSD version than the version used to run the program. So the Python function calling getentropy() doesn't check ENOSYS. glibc 2.25 added getrandom() and getentropy(), but Python tries first getentropy(). Sadly, the Python package was built on a host with a more recent kernel and libc than the user OS, and so users got the initialization error: getentropy() function calls getrandom() syscall with fails with ENOSYS.
I modified Python to handle ENOSYS and EPERM in getentropy(), and also modified the code to prefer getrandom() over getentropy(), because getrandom() supports non-blocking urandom which is required by Python, the infamous PEP 524.
https://www.python.org/dev/peps/pep-0524/
Note: Python also has to handle EPERM because an user reported that a security policy, called "QNAP", blocked the getrandom() syscall: http://bugs.python.org/issue27955
Again, providing a Python portable os.urandom() function which has almost the same properties on all platforms and all minor platform versions is a hard challenge!
The long road to getrandom() in glibc
Posted Jan 10, 2017 16:02 UTC (Tue) by mmechri (subscriber, #95694) [Link]
The long road to getrandom() in glibc
Posted Jan 10, 2017 17:09 UTC (Tue) by quotemstr (subscriber, #45331) [Link]
If we do see a separate library of separate system call wrappers, it'll be because we've failed to create a coherent system and instead "shipped the org chart".
The long road to getrandom() in glibc
Posted Jan 12, 2017 2:44 UTC (Thu) by busterb (subscriber, #560) [Link]
Copyright © 2017, Eklektix, Inc.
This article may be redistributed under the terms of the
Creative
Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds