|
|
Log in / Subscribe / Register

Why glibc's fstat() is slow

By Jonathan Corbet
September 14, 2023
The fstat() system call retrieves some of the metadata — owner, size, protections, timestamps, and so on — associated with an open file descriptor. One might not think of it as a performance-critical system call, but there are workloads that make a lot of fstat() calls; it is not something that should be slowed unnecessarily. As it turns out, though, the GNU C Library (glibc) has been doing exactly that, but a fix is in the works.

Mateusz Guzik has been working on a number of x86-related performance issues recently. As part of that work, he stumbled into the realization that glibc's implementation of fstat() is expressed in terms of fstatat(). Specifically, a call like:

    result = fstat(fd, &stat_buf);

is turned into:

    result = fstatat(fd, "", &stat_buf, AT_EMPTY_PATH);

These calls are semantically equivalent; by the POSIX definition, a call to fstatat() providing an empty string for the path and the AT_EMPTY_PATH flag operates directly on the provided file descriptor. But the difference in the kernel is significant; implementing fstat() in this way is significantly slower, for a couple of reasons.

One of those is that fstatat() is a more complex system call, so it does preparatory work that is not useful for the simple fstat() case. Once alerted to the problem, Linus Torvalds posted a patch that detects this case and avoids that extra work. But the result is still, according to Guzik's measurements, about 12% slower than calling fstat() directly.

That performance loss is the result of the second problem: fstatat() must check the provided path and ensure that it is empty. One might think that it makes no sense to even look at the path when the user has provided a flag (AT_EMPTY_PATH) that says there is nothing to be seen there but, as Al Viro pointed out, POSIX mandates this behavior. Checking the path means accessing user-space data from the kernel; that, in turn, can require disabling guardrails like supervisor mode access prevention. It all adds up to a significant amount of overhead to check an empty string.

Torvalds made it clear that he thought glibc's behavior made no sense and wondered why things were done that way. A bit later, though, he found a plausible explanation for this choice. On an x86-64 system, the kernel exports a number of related system calls, including fstat() (number 5) and newfstatat() (number 262). Torvalds concluded:

The glibc people found a "__NR_newfstatat", and thought it was a newer version of 'stat', and didn't find any new versions of the basic stat/fstat/lstat functions at all. So they thought that everything else was implemented using that 'newfstatat()' entrypoint.

But on x86-64 (and most half-way newer architectures), the regular __NR_stat *is* that "new" stat.

The "new" fstat(), after all, came about in the 0.97 release in 1992, so there was no reason for the x86-64 architecture (which arrived rather later than that) to use anything else. But, if Torvalds's explanation reflects reality, the glibc developers were fooled by the "new" part of the newfstatat name and passed over the entry point they should have used to implement fstat().

There are a few observations that one could make from this little bit of confusion:

  • The system calls (and their names) provided at the kernel boundary are not the same as those expected by user-space programmers. The glibc people know this better than anybody else, since part of their job is to provide the glue between those two interfaces, but confusion still seems to happen.
  • The fact that the kernel's documentation of the interface it presents to user space is ... mostly nonexistent ... certainly does not help prevent confusion of this type.
  • Using qualifiers like "new" in the naming of functions, products, or one's offspring tends to be unwise; what is new today is old tomorrow.

Be that as it may, even with Torvalds's change (which was merged for the 6.6-rc1 release and will presumably show up in a near-term stable update), fstat() is slower than it needs to be when glibc is being used. In an attempt to improve the situation, Guzik raised the issue on the libc-alpha list. Adhemerval Zanella Netto responded that the library developers are trying to simplify their code by using the more generic system calls whenever possible, that the AT_EMPTY_PATH problem is likely to affect all of the *at() system calls, and that, as a consequence, the problem would be "better fixed in the kernel".

Torvalds pointed out that, while other system calls have to handle AT_EMPTY_PATH, fstatat() is the only one that is likely to matter from a performance perspective; none of the others should be expected to show up as problems in real-world programs. Meanwhile, despite the misgivings expressed previously, Zanella put together a patch causing glibc to use ordinary fstat() when appropriate. Torvalds agreed that it looked correct, but complained that the implementation was messy; he seemed to prefer an alternative implementation that Zanella posted later.

As of this writing, neither version of the patch has found its way into the glibc repository; the latter version is under consideration. It is probably safe to assume that a version of this patch will be applied at some point; nobody has an interest in glibc being slower than it needs to be. This particular story has a happy ending, but it does stand as an example of what can happen in the absence of clarity around the interfaces between software components.

Index entries for this article
KernelSystem calls


to post comments

Why glibc's fstat() is slow

Posted Sep 14, 2023 17:37 UTC (Thu) by wsy (subscriber, #121706) [Link] (6 responses)

glibc's localtime() is slow without TZ environment variable.

https://blog.packagecloud.io/set-environment-variable-sav...

Why glibc's fstat() is slow

Posted Sep 14, 2023 19:07 UTC (Thu) by brenns10 (subscriber, #112114) [Link] (2 responses)

It's wild how many places that pops up! I fixed that in procps-ng where it would do a stat(/etc/local time) multiple times *per process* because of the printf formats being used. I'm sure tons of software, especially multithreaded apps, are experiencing this and slowdowns associated with it.

https://gitlab.com/procps-ng/procps/-/merge_requests/119

Why glibc's fstat() is slow

Posted Sep 14, 2023 21:56 UTC (Thu) by guillemj (subscriber, #49706) [Link]

There is https://sourceware.org/bugzilla/show_bug.cgi?id=24004 with a patch attached. You can use localtime_r(3) to avoid this problem. And at least for multithreaded code I'd expect it to be using that already, but otherwise other code might indeed suffer from this issue.

speed of repeated localtime() etc.

Posted Sep 16, 2023 0:30 UTC (Sat) by jreiser (subscriber, #11027) [Link]

Software which appends entries to a log file can call localtime (or its relatives) once per event, and there can be dozens or hundreds of events per second, for instance open(). In most cases the speed of logging can be improved by remembering the headway between the current gettimeofday() and the next future gettimeofday() when the relevant portion of localtime() can change. For an output format of HH:mm:ss, then the headway is limited to at most 1 second. For the local timezone, the minimum headway can be up to 30 minutes. If gettimeofday()still is within the headway, then there is no need to consult any any external authority.

Why glibc's fstat() is slow

Posted Sep 14, 2023 21:18 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (2 responses)

I despise the fact that POSIX insisted on standardizing the ugly TZ env variable as the standard interface for controlling timezones. It makes some amount of sense to have a process-wide value for "the local timezone." It makes considerably less sense to make that the *only* way of doing tz-aware calculations. If I want to know "what time is it in [some specific part of] Australia?", it does not necessarily follow that I want the whole process to think it is actually in Australia.

Why glibc's fstat() is slow

Posted Sep 15, 2023 0:42 UTC (Fri) by butlerm (subscriber, #13312) [Link]

tzalloc, tzfree, localtime_rz, ctime_rz (and a few others) would seem like fit candidates for POSIX standardization.

Why glibc's fstat() is slow

Posted Oct 4, 2023 16:31 UTC (Wed) by roblucid (guest, #48964) [Link]

POSIX was documenting a standard with compromise between vendors, not generally a group of developers, creating new specs.
Commercial vendors tend to embrace and extend which resulted in fragmenting the UNIX user base.

Why glibc's fstat() is slow

Posted Sep 14, 2023 20:00 UTC (Thu) by izbyshev (guest, #107996) [Link] (6 responses)

For newer architectures (riscv32, loongarch), the simpler fstat syscall doesn't exist at all[1], and statx is the only option to implement fstat, forcing the useless path check. So no, it's not just a glibc problem, at least if we're looking beyond x86-64.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

Why glibc's fstat() is slow

Posted Sep 15, 2023 6:23 UTC (Fri) by eru (subscriber, #2753) [Link] (4 responses)

I find it very strange that this depends on the CPU architecture at all. The fstat() and its relatives work on the file system, which is at much higher level than what architecture differences would naturally affect.

Why glibc's fstat() is slow

Posted Sep 15, 2023 14:08 UTC (Fri) by izbyshev (guest, #107996) [Link] (3 responses)

This is not due to architectural differences, but due to the standard kernel policy of supporting only the most general version of the syscall[1]. In this case, apparently, performance implications of having only statx were not considered.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

Why glibc's fstat() is slow

Posted Sep 15, 2023 14:51 UTC (Fri) by hmh (subscriber, #3838) [Link] (2 responses)

Looks like it would be easy enough to fix, if there's a will to do so. In fact, now would be the ideal time to do it, since you also want to ensure any libc that is ported to those arches will try to use the faster syscall instead of the generic one [where available].

I wonder if people looked at what glibc would use when they were bootstrapping these two arches, and decided to not implement the more specialized, less-generic (but faster!) members of the stat family of syscalls because glibc would not use them...

Why glibc's fstat() is slow

Posted Sep 18, 2023 13:13 UTC (Mon) by eru (subscriber, #2753) [Link]

I wonder if people looked at what glibc would use when they were bootstrapping these two arches,

What I don't get is why would they have to think about that code at all when bootstrapping. There is an architecturally neutral implementation of fstat() and friends, and all the person working on the porting should need to do is to connect each system call implementation to the low-level mechanism the particular architecture employs to jump from userland to the kernel.

Why glibc's fstat() is slow

Posted Oct 7, 2023 14:54 UTC (Sat) by izbyshev (guest, #107996) [Link]

I looked a bit further, and it turns out that the original statx implementation did support passing NULL filename to emulate fstat, but this support was explicitly dropped in [1].

It feels really weird that Linus bashed glibc for "being silly" but merged this change... OTOH back in 2017 archs without fstat syscall didn't exist, so it wasn't a problem for C libraries.

[1] https://git.kernel.org/torvalds/c/1e2f82d1e9d12223b4cbd1f...

Why glibc's fstat() is slow

Posted Sep 25, 2023 18:29 UTC (Mon) by BenHutchings (subscriber, #37955) [Link]

The statx() system call is also the only Y2038-safe stat system call on 32-bit architectures (aside from x32).

AT_EMPTY_PATH and NULL

Posted Sep 14, 2023 20:48 UTC (Thu) by geofft (subscriber, #59789) [Link]

> One might think that it makes no sense to even look at the path when the user has provided a flag (AT_EMPTY_PATH) that says there is nothing to be seen there but, as Al Viro pointed out, POSIX mandates this behavior.

I think this is a misreading of Viro's (terse) message. He doesn't say anything about POSIX, the Linux man page says that AT_EMPTY_PATH is a non-portable Linux extension, and just to be sure I don't see it in POSIX at all.

I think what he meant is that Linux itself effectively mandates this behavior, in that the current implementation of fstatat(fd, NULL, &stat_buf, AT_EMPTY_PATH is to return -EFAULT, and so userspace libraries like glibc don't have the option of setting it to NULL. That is, the problem is not that glibc is being weird by passing a non-NULL pointer to an empty string that the kernel has to go look at, the problem is that glibc has no choice but to pass a non-NULL pointer to an empty string.

But it seems to me (as a non-expert) that this is something Linux could relax - if you notice that the pointer is NULL, check for AT_EMPTY_PATH before throwing -EFAULT. It wouldn't help glibc for a long while, but it would be worth doing for several years from now, when glibc has inevitably dropped support for existing kernel versions for other reasons.

To the more general point, that the kernel shouldn't be looking at the pointer at all when AT_EMPTY_PATH is specified even if it's non-NULL, as the parent messages in that subthread point out, the kernel defines a specific non-error behavior for that case, which is to ignore AT_EMPTY_PATH and actually use the provided path, and that behavior could not be changed without technically breaking backwards compatibility for userspace - even though it's hard to imagine that anyone is relying on it. But I believe an error case (getting -EFAULT when passing NULL) can be turned into a non-error case without violating the kernel's promises about breaking userspace.

Why glibc's fstat() is slow

Posted Sep 15, 2023 6:38 UTC (Fri) by NN (subscriber, #163788) [Link]

On an adjacent topic, GNU coreutils also uses fstat in a way which seems unnecessary, e.g., compare the usage in its version of wc: https://github.com/coreutils/coreutils/blob/master/src/wc.c, heavily wrapping and processing fstat, with the busybox version (https://github.com/brgl/busybox/blob/master/libbb/wfopen_...), which reads from stdin by default or if if receives a "-" as the sole arg (see the definition of fopen_or_warn_stdin here: https://github.com/brgl/busybox/blob/master/libbb/wfopen_...).

Why glibc's fstat() is slow

Posted Sep 15, 2023 8:58 UTC (Fri) by Homer512 (subscriber, #85295) [Link] (1 responses)

So … does nobody run statistics on what system calls are used? Not even as a measure to see where effort should be focused? Or just to see where "linux system still boots fine" counts as basic test coverage?

Why glibc's fstat() is slow

Posted Sep 15, 2023 13:41 UTC (Fri) by smoogen (subscriber, #97) [Link]

There are plenty of statistics run by different groups all focused on their use-case. The issue is that use-cases vary all over and you end up with people focusing efforts on sometime 'speed' and sometimes 'this gets used a lot a little differently.. maybe just make the call handle more cases'. Then you end up cleaning up some places for one thing, and other places for another.

Why glibc's fstat() is slow

Posted Sep 15, 2023 9:18 UTC (Fri) by dottedmag (subscriber, #18590) [Link] (1 responses)

Every time I see "this is to simplify code" I flinch. People so often lose the sight of what they are doing: any software project is producing software to be used by the users, and that's the main goal. The secondary, subordinate, goal is to make the developers' lifes as easier as possible. For such a fundamental library as libc any inefficiency is multiplied gazillion times every day. In my opinion it is cruel to put developers' time above users' in this particular codebase.

Admittedly, C is such a footgun that sacrificing some performance in some rare corner cases to significantly decrease amount of code is acceptable, but selecting one syscall over another in stat, is it really so complicated?

P.S: I'm not ordering anyone to do anything, I'm posting my observations and describe attitude I try to put into my work. I thank all maintainers who put their work into the code we all benefit from, often without renumeration and in their spare time, and I try to pay it back in kind.

Why glibc's fstat() is slow

Posted Sep 16, 2023 15:14 UTC (Sat) by Paf (subscriber, #91811) [Link]

I mean, they didn’t realize it was slower.

Simpler code is a net good, freeing up time and making bugs less likely, and if you can get it without trade offs, you should do it. It seems reasonable to believe the perf would be ~the same, which they did.

So maybe they need better testing but they didn’t do anything *wrong* and certainly didn’t knowingly prioritize convenience over performance.

Why glibc's fstat() is slow

Posted Sep 15, 2023 13:49 UTC (Fri) by walters (subscriber, #7396) [Link]

I glanced at the rustix linux-raw backend, and it seems to get it right:

https://github.com/bytecodealliance/rustix/blob/93c95474e...

Why glibc's fstat() is slow

Posted Sep 16, 2023 7:20 UTC (Sat) by ssmith32 (subscriber, #72404) [Link] (1 responses)

I'm not sure if our beloved editor understands naming.

The answer to all this confusion is:

newnewfstat(...)

Clearly ;)

Why glibc's fstat() is slow

Posted Sep 21, 2023 15:13 UTC (Thu) by gutschke (subscriber, #27910) [Link]

You're not taking all the possible future needs for extensions into account.

We really should standardize on names like: fstat_new$_2006revB.2_2023revA()

Just increment dates and revisions every time you propose a patch, and document the history by keeping track of precursors of the updated API.

I'm torn between this proposal and adding full Git hashes to all symbols...

/s

patch mergins status - Why glibc's fstat() is slow

Posted Sep 28, 2023 14:16 UTC (Thu) by fabiop (guest, #24661) [Link]

glibc patch was merged yesterday:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=55...
I suppose master branch is for glibc 2.39, right? (Other branches still don't have the patch)

kernel side, the patch only looks to be in 6.6:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...
But still not in 6.1.55, 6.4.16 or 6.5.5.


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds