readdir_r() deprecated?

Posted Aug 5, 2016 2:23 UTC (Fri) by abatters (✭ supporter ✭, #6932)
Parent article: The GNU C Library version 2.24 is now available

readdir_r() is thread-safe; readdir() is not thread-safe AFAIK. Why did they deprecate the thread-safe version in favor of the unsafe one? What is a multi-threaded program supposed to do? Put a global lock around every use? What about libraries?

Bewildered and confused...

readdir_r() deprecated?

Posted Aug 5, 2016 3:15 UTC (Fri) by ehiggs (subscriber, #90713) [Link] (15 responses)

The updated man page has more information on the issues with readdir_r:

http://man7.org/linux/man-pages/man3/readdir_r.3.html

For convenience, the interesting bit for you is as follows:

* On systems where NAME_MAX is undefined, calling readdir_r() may be
unsafe because the interface does not allow the caller to specify
the length of the buffer used for the returned directory entry.

* On some systems, readdir_r() can't read directory entries with
very long names. When the glibc implementation encounters such a
name, readdir_r() fails with the error ENAMETOOLONG after the
final directory entry has been read. On some other systems,
readdir_r() may return a success status, but the returned d_name
field may not be null terminated or may be truncated.

* In the current POSIX.1 specification (POSIX.1-2008), readdir(3) is
not required to be thread-safe. However, in modern
implementations (including the glibc implementation), concurrent
calls to readdir(3) that specify different directory streams are
thread-safe. Therefore, the use of readdir_r() is generally
unnecessary in multithreaded programs. In cases where multiple
threads must read from the same directory stream, using readdir(3)
with external synchronization is still preferable to the use of
readdir_r(), for the reasons given in the points above.

* It is expected that a future version of POSIX.1 will make
readdir_r() obsolete, and require that readdir() be thread-safe
when concurrently employed on different directory streams.

readdir_r() deprecated?

Posted Aug 5, 2016 15:02 UTC (Fri) by cruff (subscriber, #7201) [Link]

Thanks for posting that description. I was bewildered also, having used readdir_r extensively in multithreaded programs.

readdir_r() deprecated?

Posted Aug 5, 2016 19:44 UTC (Fri) by wahern (guest, #37304) [Link] (13 responses)

Application code isn't supposed to use NAME_MAX, but fpathconf(dirfd(dp), _PC_NAME_MAX). Indeed, both Solaris and Linux (or, at least, glibc GNU folks) push developers away from using NAME_MAX, PATH_MAX, etc, out of a dislike for fixed limits on filesystem names. Solaris has some convoluted syscall APIs for returning various data structures with dynamic (and effectively unbounded) sizes from kernel to user space.

I think that's misguided. For better or worse, a 255-character limit on traditional filesystems is here to stay. And while we can all easily _imagine_ scenarios where we could use (or abuse) longer file names, we all know that it's not very practical. (We could always up it to 1024 and call it a day.) In the end I think the complexities outweigh any possible benefit, especially given that the software interface with the traditional filesystem is effectively moving further and further down the stack. That doesn't make it any less important, but it means stability and consistency should be emphasized over flexibility. And in any event, other environments like Windows, OS X, and POSIX itself have much more onerous limitations to their interfaces, either in terms of length or content of filenames. So this isn't an area where sane people would attempt to push the limits when solving problems, at least not in a way which relies on filenames for storing arbitrary data.

Given the current security realities, I think it's better to focus on stabilizing and simplifying the syscall and libc landscape. While interfaces like readdir hide the complexity from application code, that's beside the point--there have been plenty of bugs and exploits in both glibc and the kernel related to their interface implementations. Arbitrary limits abound in software and that will never change. We shouldn't try to support dynamic limits at all costs, especially when the relative benefit is so miniscule and speculative.

That said, it's obviously far easier to use readdir than readdir_r. And, yeah, most implementations are thread safe. But at this point Linux is the only OS keeping up-to-date with POSIX, largely because Red Hat employees dominate the committee. The BSDs are falling further behind, and commercial vendors like Solaris and AIX effectively stopped caring altogether. Both AIX and Solaris seem to have focused on Linux/glibc compatibility (whether or not a POSIX addition) and other practical and usability concerns.

The thing is, if people never tried compiling their software on, e.g., Solaris, they would have never realized that NAME_MAX is not always defined. And while it's rare to see software that actually uses readdir_r, that only proves that few people paid attention to the issues to begin with. While we can sometimes improve the situation, as in this case by making readdir thread safe, we'll never be able to excise all the code that directly or indirectly relies on NAME_MAX, PATH_MAX, and similar limits. Heck, those constants influenced (and in some cases were influenced by) fixed limits in various RFCs. We'll never be able to leave them behind. I'd rather see us embrace them so we can work to simplify code bases.

I'm making far too much of this than it deserves. Mostly I'm responding to the opinion that seems to be widely held in some circles that we should support arbitrary limits in filesystem paths and similar kernel interfaces. While I don't agree with David Wheeler that we should restrict filenames to, e.g., UTF-8-encoding graphic characters (with some additional constraints on whitespace), I basically agree with the general sentiment. Secure interfaces are simple interfaces; simple interfaces often require drawing strict lines; and strict lines are sometimes arbitrary. So be it. More oever, in this area the lines have already been drawn by historical practice, and intentionally or unintentionally baked into the architecture of almost all software and software standards.

readdir_r() deprecated?

Posted Aug 5, 2016 20:43 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

Windows allows up to 32k characters in a filename through Win32 APIs.

readdir_r() deprecated?

Posted Aug 5, 2016 21:39 UTC (Fri) by Sesse (subscriber, #53779) [Link] (5 responses)

I don't know the details, but I know .NET developers who had to move their projects into c:\p (from c:\Documents and Settings\Full name of the user\Visual Studio projects, or whatever) because they ended up hitting some path length limit. So in practice, at least with Microsoft's own tools and quite recently (3–4 years), the limit is much lower than 32k.

readdir_r() deprecated?

Posted Aug 5, 2016 21:55 UTC (Fri) by barryascott (subscriber, #80640) [Link]

the windows limit traditionally was 257, 255 for the path + 2 for c: drive.
Using UNC syntax allows the approx 32k path limit. UNC paths start with \\ and are basically unfriendly for humans.
You would need a nice UI to hide the \\ geeky stuff.

readdir_r() deprecated?

Posted Aug 6, 2016 0:45 UTC (Sat) by compenguy (guest, #25359) [Link] (1 responses)

You only get the very long paths/filenames on windows when calling into the unicode APIs, which is not AIUI what .NET does.

From experience I can say that the "DOS" commandline tools on windows *cannot* be coaxed into accepting the longer paths. I wound up re-implementing them in python (pycp and pymkdir) in order to manipulate very long paths in windows.

readdir_r() deprecated?

Posted Aug 6, 2016 10:19 UTC (Sat) by k8to (guest, #15413) [Link]

It's worse than that. You have to use the unicode apis (which you should generally do on windows), and you have to pre-pend to your pathnames the magic long-pathname cookie. "\\?\" Thus UNC long pathnames become garbage like "\\?\\\servername\share\dir\dir\filename" and suchlike.

It's pretty awful to have to shove this garbage into the path strings. Code later has to know to take them back out again when showing them to users, printing them to error logs, etc.

readdir_r() deprecated?

Posted Aug 6, 2016 12:51 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

Visual Studio project file paths are limited to some low ceiling and project filenames are limited to 237 or so (with the .vcxproj extension). Also, if your build sysytem doesn't use response files, many of the compiler tools can hit the command line length limit with the MSVC toolchain if you have many include directories.

readdir_r() deprecated?

Posted Aug 6, 2016 17:25 UTC (Sat) by Mook (subscriber, #71173) [Link]

On the happy side, as of last week with .net 4.6.2 it's now possible to to use longer path names. Sometimes, maybe. (It looks like it might possibly work with Windows 10 with the recent update, since there's a OS-side switch involved too.) Assuming nothing else in the code between the user and the disk gets in the way.

It most definitely was limited to ~260 up to a few years ago, though.

readdir_r() deprecated?

Posted Aug 8, 2016 7:01 UTC (Mon) by ssmith32 (subscriber, #72404) [Link]

Perhaps things have changed, but back in win7 it was some win32 apis, and not others. So I ended up with weird situations where you could write out filepaths that you couldn't read with many programs (at the the time, 7zip). So it was even more of a mess that just prepending special characters.

And curious whether the above is fine with path limits or filename limits being 256? Paths can easily exceed 256...

readdir_r() deprecated?

Posted Aug 6, 2016 17:13 UTC (Sat) by ehiggs (subscriber, #90713) [Link] (1 responses)

"Mostly I'm responding to the opinion that seems to be widely held in some circles that we should support arbitrary limits in filesystem paths and similar kernel interfaces."

Well of course we need to have *some* restriction otherwise resource exhaustion becomes an attack vector. And then it indeed becomes arbitrary.

readdir_r() deprecated?

Posted Aug 6, 2016 19:42 UTC (Sat) by lsl (subscriber, #86508) [Link]

"No arbitrary limit" is generally understood to mean "limited only by available resources/memory".

Preventing an untrusted entity from exhausting resources is probably more reliably done by limiting the resources available to it, regardless of what they're used for.

readdir_r() deprecated?

Posted Aug 7, 2016 18:36 UTC (Sun) by xtifr (guest, #143) [Link] (2 responses)

> For better or worse, a 255-character limit on traditional filesystems is here to stay.

Speaking as someone who has fixed code to satisfy the requests of Hurd developers, the Hurd, as I understand it, does *not* have any arbitrary pathname limits. And, while you or I might be skeptical about the long-term prospects of the Hurd, I doubt you'll persuade the Gnu project to ignore it in their designs and plans.

readdir_r() deprecated?

Posted Aug 10, 2016 23:53 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

Indeed, 'no arbitrary limits' has been in the GNU Coding Standards for decades, in direct reaction to the Unix tradition of just using randomly fixed-size buffers everywh

readdir_r() deprecated?

Posted Aug 11, 2016 15:19 UTC (Thu) by zlynx (guest, #2285) [Link]

Nice.

:-)

readdir_r() deprecated?

Posted Aug 5, 2016 21:18 UTC (Fri) by ballombe (subscriber, #9523) [Link] (3 responses)

I assume GLIBC readdir use thread-local-storage which make readdir_r obsolete.

readdir_r() deprecated?

Posted Aug 5, 2016 23:52 UTC (Fri) by lsl (subscriber, #86508) [Link] (1 responses)

Why would you need TLS when readdir takes an opaque context argument anyway?

readdir_r() deprecated?

Posted Aug 9, 2016 15:26 UTC (Tue) by vapier (guest, #15768) [Link]

it doesn't use TLS (ignoring of course errno). thread safety is achieved by having a lock embedded in the opaque DIR structure, and then readdir grabs/releases that lock.

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/...

readdir_r() deprecated?

Posted Nov 8, 2017 0:17 UTC (Wed) by martinkunev (guest, #119485) [Link]

This means to write a portable program macro switches have to be used so that readdir() is called when the platform has glibc and readdir_r() when the platform has other versions of the standard library. How convinient...