|
|
Subscribe / Log in / New account

The shrinking role of ETXTBSY

By Jonathan Corbet
August 19, 2021
Unix-like systems abound with ways to confuse new users, many of which have been present since long before Linux entered the scene. One consistent source of befuddlement is the "text file is busy" (ETXTBSY) error message that is delivered in response to an attempt to overwrite an executable image file. Linux is far less likely to deliver ETXTBSY results than it once was, but they do still happen on occasion. Recent work to simplify the mechanism behind ETXTBSY has raised a more fundamental question: does this error check have any value at all?

The "text" that is busy in this case refers to a program's executable code — it's text that is read by the CPU rather than by humans. When a program is run, its executable text is mapped into the running process's address space. When this happens, Unix systems have traditionally prevented the file containing that text from being modified; the alternative is to allow the code being run to be changed arbitrarily, which rarely leads to happy outcomes. For extra fun, the changed code will only be read if it is faulted into RAM, meaning that said unhappy outcomes might not happen until hours (or days) after the file has been overwritten. Rather than repeatedly explain to users why their programs have crashed in mysterious ways, Unix kernel developers chose many years ago to freeze the underlying file while those programs run — leading to the need to explain ETXTBSY errors instead.

Perhaps the easiest way to generate such an error is to try to rebuild a program while some process is still running it. Developers (those working in compiled languages, anyway) tend to learn early on to respond to "text file busy" errors by killing off the program they are debugging and rerunning make.

How it works

Deep within the kernel, the inode structure is used to represent files; one field within that structure is an atomic_t called i_writecount. Normally, this field can be thought of as a count of the number of times that the file is held open for writing. If, however, i_writecount is less than zero, it is interpreted instead as a count of the number of times that the writing of this file is being blocked. If the file is an executable file, then each process that runs it will decrement i_writecount for the duration of that execution. This field thus functions as a sort of simple lock. If its value is negative, the file cannot be opened for write access; if, instead, its value is positive, attempts to block write access will fail. (Similarly, an attempt to execute a file that is currently open for writing will fail with ETXTBSY).

In current kernels, it is possible to attempt to block write access with a call to deny_write_access(), but the more common way is to create a memory mapping with the VM_DENYWRITE flag set. So, for example, the execve() system call will map the code sections of the executable file into memory with VM_DENYWRITE; that mapping causes i_writecount to be decremented (this will fail if the file is open for writing, of course). When the mapping goes away (the running program exits or calls execve()), i_writecount will be incremented again; if it reaches zero, the file will once again become writable.

Back in the early days of Linux, prior to the Git era, the mmap() system call supported a flag called MAP_DENYWRITE that would cause VM_DENYWRITE to be set within the kernel and thus block write access to the mapped file for the duration of the mapping. There was a problem with this option, though: any process that could open a file for read access could map it with MAP_DENYWRITE and prevent any other process on the system from writing that file. That is, at best, an invitation to denial-of-service attacks, so it was removed long ago. Calls to mmap() with that flag set will succeed, but the flag is simply ignored.

Shared libraries

The removal of MAP_DENYWRITE had an interesting, if obscure, side effect. One may think of a file, such as /usr/bin/cat, as containing an executable program. In truth, though, much of the code that will be executed when somebody runs cat is not found in that file; instead, it is in a vast number of shared libraries. Those files contain executable code just like the nominal executable file, so one would think that they, too, would be protected from writing while in use.

Once upon a time, that was indeed the case; the ancient uselib() system call will map libraries with writing blocked. It may well be, though, that there are no systems still using uselib(); instead, on current systems, shared libraries are mapped from user space with mmap(). The MAP_DENYWRITE flag was created for just this use case, so that shared libraries could not be written while in use. When MAP_DENYWRITE went away, so did that protection; current Linux systems will happily allow a suitably privileged user to overwrite in-use, shared libraries.

The end result of this history is that the memory-management subsystem has a bunch of leftover code, in the form of the support for MAP_DENYWRITE and VM_DENYWRITE, that no longer has any real purpose. So David Hildenbrand decided to take it out. With this patch set installed, execve() will simply call deny_write_access() directly, and mmap() no longer has to consider that case at all. This results in a user-space API change: uselib() no longer blocks write access to shared libraries. Nobody expects anybody to notice.

An idea whose time has passed?

In response to Hildenbrand's patch set, GNU C library developer Florian Weimer pointed out that the library has "a persistent issue with people using cp (or similar tools) to replace system libraries". He did not say that library developers have long since tired of explaining to those users why their applications crashed in mysterious ways, but there was no need to. It would be nice, he said, to provide a way to prevent this sort of error or, at least, a way to deterministically tell that a crash was caused by an overwritten library. There are a number of ways that could be established without bringing back MAP_DENYWRITE, he said.

The discussion wandered into other ways to protect shared libraries from being overwritten while in use; Eric Biederman suggested installing them with the immutable bit set, for example. But Linus Torvalds made it clear that he thought the problem was elsewhere:

The kernel ETXTBUSY thing is purely a courtesy feature, and as people have noticed it only really works for the main executable because of various reasons. It's not something user space should even rely on, it's more of a "ok, you're doing something incredibly stupid, and we'll help you avoid shooting yourself in the foot when we notice".

After Torvalds repeated that point a couple of times, Andy Lutomirski suggested just removing the write-blocking mechanism altogether:

It’s at best erratic — it only applies for static binaries, and it has never once saved me from a problem I care about. If the program I’m recompiling crashes, I don’t care — it’s probably already part way through dying from an unrelated fatal signal. What actually happens is that I see -ETXTBUSY, think “wait, this isn’t Windows, why are there file sharing rules,” then think “wait, Linux has *one* half baked file sharing rule,” and go on with my life.

Torvalds was amenable to the idea, though he worried that some application somewhere might depend on the ETXTBSY behavior. But he noted that it has been steadily weakened over time, and nobody has complained so far. Removing it could be tried, he continued: "Worst comes to worst, we'll have to put it back, but at least we'd know what crazy thing still wants it".

Al Viro worried, though, that some installation scripts might depend on this behavior; Christian Brauner added that allowing busy executable files to be written could make some security exploits easier. Hildenbrand said that his patch set already makes the write-blocking behavior much simpler, and that he would be in favor of leaving it in place for now. The second version of the patch set, posted on August 16, retains the ETXTBSY behavior for the main executable file.

Hildenbrand's simplification work seems sure to land during the 5.15 merge window; whether ETXTBSY will disappear entirely is rather less certain. Getting rid of it strikes some developers as a nice cleanup, but there is nothing forcing that removal to happen at this time. Meanwhile, the potential for user-space regressions always exists when behavior is changed in this way. The safe approach is thus to leave ETXTBSY in place for now.

[Postscript: Lutomirski pointed to mandatory locks as the one other place in the kernel that implements unwelcome file-sharing rules. That feature is indeed unpopular; the kernel document on mandatory locks starts with a section on why they should not be used. In 2015, a configuration option was added to make mandatory locks optional, and some distributors have duly disabled them. One potential outcome of the ETXTBSY discussion looks likely to be an effort to get other distributors to do the same until it becomes clear that mandatory locks can safely be removed. Stay tuned.]

Index entries for this article
KernelSystem calls/mmap()


to post comments

The shrinking role of ETXTBSY

Posted Aug 19, 2021 15:49 UTC (Thu) by Sesse (subscriber, #53779) [Link] (21 responses)

Can we make cp actually DTRT (create a new file, atomically rename) instead of overwriting the file? :-)

The shrinking role of ETXTBSY

Posted Aug 19, 2021 16:05 UTC (Thu) by chris_se (subscriber, #99706) [Link] (13 responses)

The problem with that is that you might have write permissions on the file, but not necessarily the containing directory.

It would be great if the kernel had an open flag that could be combined with O_TRUNC that atomically replaces the file (assuming you would be able to open it without that flag), but keeps the old file's permissions / creation date / etc. -- and if the old file was still open it would be treated as if the old file had been deleted. (Similar in effect to how the create new file + atomic rename would work.) cp could then use that flag, same thing with compilers/linkers, which would also avoid the issues described in the article. And on platforms / older kernels that don't provide the flag the same tools could behave the same as before, as if the flag hadn't been set.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 18:40 UTC (Thu) by ibukanov (subscriber, #3942) [Link] (4 responses)

And without the flag the kernel should return ETXTBSY as before. This way tools can be gradually updated with a new option to support this flag that can be even default.

File version numbers à la OpenVMS?

Posted Aug 29, 2021 23:55 UTC (Sun) by skissane (subscriber, #38675) [Link] (3 responses)

File version numbers, as found in OpenVMS (and a bunch of other DEC family operating systems before it) would be a cool solution to this. If an inode can contain multiple data versions, overwriting an in-use executable could create a new data version for the inode. Existing file descriptors would point to the old data version but newly opened file descriptors would point to the new one. The old data version could be automatically deleted when its last open file descriptor closed.

More generally, one could use this to enable what I've heard some people call "unit files". (I don't know if that is standard terminology at all, I heard it from some commenter on HN.) Basically, a unit file acts atomically – when you open it, you get a read snapshot of the file at the time you opened it. When you write to it, your changes aren't visible to other readers until you close the file (or call some kind of "commit" system call). (This would likely imply only one process can have the file open for writing at a time). Many apps would really benefit from these kind of transactional semantics, right now they have to use something like SQLite to get them, but with unit files they could get those semantics directly from the OS.

Of course, I doubt Linux is going to any such features. It is complicated, probably wouldn't be used that much, doesn't fit well with POSIX, could not be supported by existing filesystems without substantial changes (although a few do have some existing COW support, and they might be able to leverage that to implement this feature with only modest effort). But one can daydream about a parallel universe in which UNIX never really took off, and we are all using some open-source clone of VMS. In some ways no doubt an uglier universe, but in other ways a prettier one.

File version numbers à la OpenVMS?

Posted Aug 31, 2021 13:20 UTC (Tue) by Wol (subscriber, #4433) [Link] (2 responses)

> Many apps would really benefit from these kind of transactional semantics, right now they have to use something like SQLite to get them, but with unit files they could get those semantics directly from the OS.

Again harping on Pr1mos, and I don't know how easy it would be to retrofit to Linux, but files had reader/writer access controls. I don't know whether it was set by the first application to open it, or more likely set in the file system, but you had a choice of "multiple readers (or one writer)", "multiple readers and one writer", and "multiple readers and multiple writers".

So, as it implies, the first person to open the file got it, any other attempts were checked against this lock and the open succeeded or failed depending. When I wrote an accounts package, everything was configured NR&1W, all files were opened read-only unless actually updating the data, and all files were opened in a defined order to prevent deadlocks. (Still didn't prevent one user changing the data underneath another, but the program made sure this didn't matter...)

Cheers,
Wol

File version numbers à la OpenVMS?

Posted Sep 16, 2024 17:38 UTC (Mon) by jch (guest, #51929) [Link] (1 responses)

> Again harping on Pr1mos, and I don't know how easy it would be to retrofit to Linux, but files had reader/writer access controls. I don't know whether it was set by the first application to open it, or more likely set in the file system, but you had a choice of "multiple readers (or one writer)", "multiple readers and one writer", and "multiple readers and multiple writers".

Aren't those just mandatory file locks taken at open time?

File version numbers à la OpenVMS?

Posted Sep 16, 2024 19:12 UTC (Mon) by Wol (subscriber, #4433) [Link]

Reading that again, I'm not sure if I made myself clear, but these were file system attributes. So you had "NR-1W" (either readers OR writer), "NR&1W" (as many readers as you liked, only one writer), and "NR&NW" (as many readers and writers as you liked).

So my accounts system had "NR&1W" set on all files, and only ever opened a file to write when it was doing a commit. There was also always an explicit open hierarchy (as in I only ever opened individual clients after opening the client summary, so I couldn't get a deadlock, same for other ledgers).

So it relied on programming discipline, but could be proven to work if the rules were followed.

Cheers,
Wol

The shrinking role of ETXTBSY

Posted Aug 19, 2021 22:21 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (6 responses)

> and if the old file was still open it would be treated as if the old file had been deleted

I would think that this would require assigning a new inode number to keep the content separate. Even in your description you refer to "old file" and "new file", not "old content" and "new content". If you change the inode number, however, then you need to update the directory entry, which involves writing to the directory, so we're back to the original permission issue.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 23:15 UTC (Thu) by SLi (subscriber, #53131) [Link] (4 responses)

Would it cause problems if such an operation by a user with write permission to the file but not the directory could patch the new node number to the directory entry?

The shrinking role of ETXTBSY

Posted Aug 20, 2021 5:57 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (2 responses)

Updating the directory entry would be a user-visible change since getdents(2), readdir(3), and stat(2) all report the inode number; more importantly it could result in breaking hardlinks (as the atomic rename approach would) which is something that normally requires write access to the containing directory. Beyond that I'm not sure how much trouble would result from allowing just the inode number to be updated in an otherwise read-only directory entry… it's not something that's seen much testing.

Atomically swapping the *content* of two files (one of which could be temporary/unlinked) could be a useful operation for some cases which currently rely on atomic rename, and wouldn't have any issue with read-only directories, but it wouldn't help in this particular situation since any existing open file descriptions or mapped memory would immediately see the new content just as if the file had been modified with ftruncate(2) and write(2).

The shrinking role of ETXTBSY

Posted Aug 20, 2021 17:47 UTC (Fri) by smurf (subscriber, #17840) [Link] (1 responses)

Yeah, but you could extend the atomic swap to *also* swap the file descriptor of A to point to B instead.

This will cause all kinds of interesting race conditions unless handled carefully … but it could work.

The shrinking role of ETXTBSY

Posted Aug 22, 2021 8:35 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

Truncating a file already causes all kinds of interesting race conditions. But they're in userspace, so we all collectively pretend that they're not a problem.

(Because, well, what's the alternative? Any global mutable state will inevitably race, at some level of abstraction, and global mutable state is literally the whole point of a filesystem.)

The shrinking role of ETXTBSY

Posted Aug 20, 2021 6:42 UTC (Fri) by pbonzini (subscriber, #60935) [Link]

Yes, the operation would break hard links. Not having write permissions for the directory implies not being able to break hard links.

The shrinking role of ETXTBSY

Posted Aug 20, 2021 10:55 UTC (Fri) by runekock (subscriber, #50229) [Link]

Would it be possible to assign the new inode number to the old contents instead?

The shrinking role of ETXTBSY

Posted Aug 24, 2021 19:21 UTC (Tue) by flussence (guest, #85566) [Link]

I'm imagining a hypothetical O_REPLACE flag here, which gives the opening process a filehandle that shadows the existing file, and its effects won't become visible to the outside world until the opener explicitly invokes fsync() or close(). Similar atomicity semantics to the tempfile-write-sync-rename pattern, but without the tempfile juggling and the extra problems that causes.

Admittedly it doesn't sound all that easy to implement something like this, even to a kernel layperson.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 16:37 UTC (Thu) by mbunkus (subscriber, #87248) [Link] (4 responses)

You might not have enough space on the device for that, and checking whether or not that's the case beforehand is impossible to do reliably (what about file holes: are there any, does the FS support them; or simply that other processes might write to the same FS at the same time).

The shrinking role of ETXTBSY

Posted Aug 19, 2021 16:41 UTC (Thu) by Sesse (subscriber, #53779) [Link] (3 responses)

Sure, but that also goes when overwriting the file.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 17:38 UTC (Thu) by mbunkus (subscriber, #87248) [Link] (2 responses)

I should have been clearer on what I mean. Of course I'm not talking about new_file being bigger than (old_file + free_before_copying). That wouldn't fit either way.

I'm talking about new_file being smaller than (old_file + free_before_copying) but being larger than free_before_copying. If you copy-to-temporary with atomic-rename you'll run out of space whereas directly opening the target file for writing will work.

Take this dummy 1MB file system with one 800KB file occupying it, trying to overwrite the 800KB file with a 400KB file:

[0 root@sweet-chili /home/mosu/tmp/mp] df .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop0 996 797 128 87% /home/mosu/tmp/mp
[0 root@sweet-chili /home/mosu/tmp/mp] ls -l
total 796
-rw-r--r-- 1 root root 800000 2021-08-19 19:30 800KB
[0 root@sweet-chili /home/mosu/tmp/mp] ls -l ../*KB
-rw-r--r-- 1 root root 400000 2021-08-19 19:30 ../400KB
-rw-r--r-- 1 root root 800000 2021-08-19 19:30 ../800KB
[0 root@sweet-chili /home/mosu/tmp/mp] cp ../400KB .
cp: error writing './400KB': No space left on device
[1 root@sweet-chili /home/mosu/tmp/mp] rm 400KB
[0 root@sweet-chili /home/mosu/tmp/mp] cp ../400KB 800KB
[0 root@sweet-chili /home/mosu/tmp/mp] ls -l
total 404
-rw-r--r-- 1 root root 400000 2021-08-19 19:33 800KB
[0 root@sweet-chili /home/mosu/tmp/mp]

rsync does the copy-to-temporary with atomic-rename, but it has always done so & is therefore fine. Of course it does have options to deal with low-space file systems the user can turn on, notably "--delete-before".

For cp it would be a huge change in semantics.

The shrinking role of ETXTBSY

Posted Aug 20, 2021 17:49 UTC (Fri) by smurf (subscriber, #17840) [Link] (1 responses)

You can tell rsync to do the writing in place, though. This vastly speeds up some sync jobs, particularly databases.

The shrinking role of ETXTBSY

Posted Aug 20, 2021 18:19 UTC (Fri) by mbunkus (subscriber, #87248) [Link]

Yep, rsync wasn't the best additional example here, in fact efficient in-place updates is what rsync is most known for.

My main point was that such a change would change cp's semantics. I shouldn't have brought rsync up.

The shrinking role of ETXTBSY

Posted Aug 21, 2021 7:15 UTC (Sat) by Homer512 (subscriber, #85295) [Link]

Unfortunately, that would break existing setups that rely on the default being overwriting and expect the new content to show up in the old file descriptor.

The shrinking role of ETXTBSY

Posted Aug 30, 2021 16:03 UTC (Mon) by nivedita76 (subscriber, #121790) [Link]

Does any installer use cp, though? They should already be doing the right thing, no?

What’s the use case for doing cp overwriting an existing executable or shared library?

Interesting discussion on ETXTBSY

Posted Aug 19, 2021 16:34 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link] (11 responses)

This is an interesting discussion on trade-offs.

Clearly there's value in making the implementation code simpler (easier to maintain, more likely to be correct, etc.).

However, while it may be "purely a courtesy feature", courtesy is nice. Humans make an incredible number of mistakes; detecting & countering some ongoing footguns *can* be helpful, even if the detection only works in some cases. Especially since it's been that way "from the beginning"; applications might depends on it. It's not a security mechanism, so it doesn't need to counter "all cases" to be useful. I think this is best left in.

Interesting discussion on ETXTBSY

Posted Aug 21, 2021 23:28 UTC (Sat) by neilbrown (subscriber, #359) [Link] (9 responses)

> I think this is best left in.

I tend to agree. Further, I think it would be good to make it even more useful.
Currently only executables benefit from ETXTBSY. What if we added an O_DENYWRITE open flag.
Then shared-library loaders could set this, as could script interpreters.

With a bit of work, we could make O_DENYWRITE | O_RDWR work so that the opener can write but nothing else can.
This would be great for swap files.

Obviously O_DENYWRITE would require write-permission to the file, thus avoiding the DOS problems of MAY_DENYWRITE.

Interesting discussion on ETXTBSY

Posted Aug 22, 2021 21:53 UTC (Sun) by neilbrown (subscriber, #359) [Link] (7 responses)

> Obviously O_DENYWRITE would require write-permission to the file,

and obviously that would make it useless for shared-library loaders and script interpreters.

So I've completely changed my mind. Only permissions should be effective at preventing a process from writing to a file. A correctly written program should choose not to write to a file that is in-use.
This would require a way to find out if a file is in use. If there are needs in this area, that is the direction that we should innovate.

Maybe shared-library loaders and script interpreters should take a LOCK_SH flock on the file, and 'cp' should replace-instead-of-overwrite if it cannot get a LOCK_EX flock.

Interesting discussion on ETXTBSY

Posted Aug 23, 2021 0:40 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (6 responses)

No. A thousand times no.

I do not want my tools doing two completely different things depending on some entirely unrelated state. A given cp invocation should always overwrite, or it should always replace. It must not inspect the phase of the moon to decide which code path to use. That's just an outage waiting to happen. If you want it to support both use cases, add a flag and require the user to be explicit.

Interesting discussion on ETXTBSY

Posted Aug 23, 2021 1:03 UTC (Mon) by neilbrown (subscriber, #359) [Link] (4 responses)

> I do not want my tools doing two completely different things depending on some entirely unrelated state.

And yet .... they already do.

"mv" will prefer the rename() systemcall, but if that returns EXDEV, it will fall back to copy-and-remove-original.

But maybe EXDEV not as "unrelated" as a flock?

Interesting discussion on ETXTBSY

Posted Aug 23, 2021 1:26 UTC (Mon) by Wol (subscriber, #4433) [Link] (1 responses)

And so, very annoyingly, does cp.

If the target is a pre-existing directory, it copies into it. If the target is a pre-existing file it overwrites it. And if the target doesn't exist, it makes a copy called the target.

In other words, if you are copying directories in a script, you need a whole bunch of wrapper code if you want the results to be reproducible.

Cheers,
Wol

Interesting discussion on ETXTBSY

Posted Aug 23, 2021 5:01 UTC (Mon) by mchapman (subscriber, #66589) [Link]

In other words, if you are copying directories in a script, you need a whole bunch of wrapper code if you want the results to be reproducible.

You can use cp -R source/. dest/, which will do the right thing whether or not dest is an existing directory.

With GNU cp, you can also do cp -R --no-target-directory source/ dest/.

Interesting discussion on ETXTBSY

Posted Aug 23, 2021 15:40 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

> And yet .... they already do.

I find it baffling that you seem to tacitly assume the existing design is perfect. It's obviously not. The -T flag, for example, only needs to exist because cp and mv "helpfully" default to interpreting the last argument as -t if it is possible to do so. If you had to explicitly pass -t, then -T would not need to exist, and a whole class of bugs would have been eliminated.

Interesting discussion on ETXTBSY

Posted Aug 23, 2021 22:19 UTC (Mon) by neilbrown (subscriber, #359) [Link]

> I find it baffling that you seem to tacitly assume the existing design is perfect.

I did not mean to imply that, and am sorry that it came across that way.
What I was trying to highlight was that your stated desire about behaviour of tools was already thwarted by reality. The desire came across as a bit naive.

I certainly see the attraction of having simple tools with simple semantics - "Do one thing and do it well". However I know from experience that such tools are usually *too* simple. Different people have different opinions about what the "one thing" should be, and about what it means to do it "well".

There is, and should be, a tension between adding functionality and maintaining simplicity. Both have value and there is no perfect balance. Rather we iteratively find a dynamic balance through robust "conversation" (which involves sharing both opinions and code).

I didn't imagine my proposal about how 'cp' could handle locking would be the final word on the subject, just another contribution to an ongoing conversation. Thanks for adding to the conversation!

Interesting discussion on ETXTBSY

Posted Aug 24, 2021 16:23 UTC (Tue) by immibis (subscriber, #105511) [Link]

Then the kernel should do it instead. Mapping an executable or library file should be a copy-on-write operation. When the underlying file is replaced the kernel should use the old pages for the existing mappings and the new pages for new mappings.

Of course, that's probably impossible in current Linux.

Interesting discussion on ETXTBSY

Posted Sep 12, 2021 18:45 UTC (Sun) by nix (subscriber, #2304) [Link]

> Currently only executables benefit from ETXTBSY. What if we added an O_DENYWRITE open flag.

What if we arranged for ETXTBSY to be emitted by default for files which have pages mapped executable in some process's address space? (The flag would obviously be flipped by the first mmap(MAP_EXEC) of that file, and flipped off by munmap). This is, obviously, a change in semantics, but surely not even jitters rely on writing to executable file-backed pages (differing cache coherency semantics make this operation intrinsically nonportable in any case).

This prevents the "easy DoS" of open-with-a-flag-to-deny-overwriting -- the only DoS vector now available is the one we already have, "execute something and nobody else can overwrite it while you're executing from it", which seems entirely, y'know, desirable.

Interesting discussion on ETXTBSY

Posted Aug 22, 2021 8:42 UTC (Sun) by niner (subscriber, #26151) [Link]

For what it's worth, that "courtesy" has not helped me even once in 25 years. On the contrary, I regularily run into it when debugging and it has me hunting through my open terminal windows looking for that gdb session that's still running. Like many other cases where the computer thinks it's smarter than me, I could do very well without this.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 16:44 UTC (Thu) by mb (subscriber, #50428) [Link] (11 responses)

How do the major package managers (deb/rpm) update (executable-) files?

I would have expected them to use some kind of atomic-rename to avoid in-between states. As far as I understand it, this would avoid ETXTBSY (and also be crash-safe).

And what about install (1)?

The shrinking role of ETXTBSY

Posted Aug 19, 2021 20:56 UTC (Thu) by andyc (subscriber, #1130) [Link] (1 responses)

With rpm at least, you will generally use the install(1) command.

install(1) by default will do an unlink(2) then a copy. Thus the old executables inode remains in place until the program exits.

The shrinking role of ETXTBSY

Posted Aug 20, 2021 0:41 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

During the `%install` step, sure. But I can't imagine `librpm` is forking off thousands of `install(1)` processes during *package* installation rather than going through some more direct syscall dance route.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 23:50 UTC (Thu) by smcv (subscriber, #53363) [Link] (5 responses)

dpkg writes all the new content to new files (foo.dpkg-new, I think), then does an atomic-overwrite via rename. I would expect any package manager that modifies the system in-place to have to do the same.

I think dpkg actually goes further than that, by hard-linking all the old content to a backup name (foo.dpkg-old), then renaming all the new files so foo.dpkg-new replaces foo, and finally deleting all the foo.dpkg-old - so that if it gets an error halfway through unpacking a package, it can use all the foo.dpkg-old files to roll back to the old version in a consistent state.

The shrinking role of ETXTBSY

Posted Aug 20, 2021 6:04 UTC (Fri) by mb (subscriber, #50428) [Link]

Cool, thanks to all of you. Now I understand where dpkg-{new,old} come from. :)

The shrinking role of ETXTBSY

Posted Aug 20, 2021 8:04 UTC (Fri) by Sesse (subscriber, #53779) [Link] (3 responses)

Even more, it also writes the information to a journal (redo log) of its own, which is fsync()-ed to disk, so if there's a power loss, it can roll forward on next run. This is one of several reasons why dpkg unpacking takes longer than just unpacking a tarball. (It is my opinion that there are needlessly many fsync-s in this process and that it can be improved upon, but that's a different story.)

The shrinking role of ETXTBSY

Posted Aug 20, 2021 17:52 UTC (Fri) by smurf (subscriber, #17840) [Link] (2 responses)

"eatmydata dpkg …" / "eatmydata apt …" to the rescue.

Yes this means that a power failure / kernel crash halfway through your update is likely to kill your system, but when that is sufficiently unlikely …

NB: WARNING: do not use this command for anything that triggers a reboot. Ever.

The shrinking role of ETXTBSY

Posted Aug 21, 2021 11:51 UTC (Sat) by Jonno (subscriber, #49613) [Link]

> "eatmydata dpkg …" / "eatmydata apt …" to the rescue.

Or you could use `dpkg --force-unsafe-io ...` / `apt --option DPkg::options::=--force-unsafe-io ...`.

That will only remove the fsync before renaming file X.dpkg-new to X, which gives you about 99% of the speed boost from eatmydata, without risking the apt and dpkg databases and logs.

(Meaning that if something does break you can inspect the logs and simply reinstall any packages that was unpacked since your last system-wide sync to fix your installation. If the broken package is essential it might be a bit tricky to do, but you can always boot from a different boot media, such as the Debian installer CD, and run `dpkg --instdir=/mnt/target ...` to get going again).

The shrinking role of ETXTBSY

Posted Sep 16, 2024 9:15 UTC (Mon) by MarcB (guest, #101804) [Link]

Reboot should be fine, the kernel will flush all buffers before unmount. It is only crashes or power losses that would be a problem.

Unix atomic actions, or how to replace an executable or library

Posted Aug 26, 2021 16:00 UTC (Thu) by davecb (subscriber, #1574) [Link] (2 responses)

Unix was designed with a minimum number of ways to do anything, and a pair of 'atomics'. From as far back as v6,
* open had the O_CREAT flag, which would atomically create a file, or fail if it existed
* mv atomically replaced the inode under a filename, allowing new openers of the file to get the new version, and everyone with an open copy to continue using the old one.

If you think the latter sounds like the read-copy-update Linux uses inside the kernel, you'd be right: it comes from the same seminal papers.

The latter is the canonical way to update a file or library: I used it heavily on Unix and Solaris (I was part of the shared library team), update apparently uses it and I considered it one of the really impressive things about Unix-derived systems. And it fits in nicely as part of the "everything is a file" metaphor.

--dave

Unix atomic actions, or how to replace an executable or library

Posted Aug 26, 2021 16:05 UTC (Thu) by davecb (subscriber, #1574) [Link] (1 responses)

Just out of curiosity, does the Linux run-time linker decrement the i_writecount in the library inode? Solaris did...

If so, that suggests that Linux could have the same "you're about to mess up one of your processes" warning if someone inadvertently attempted to use cp or rsync to update an in-use library, just like an in-use program.

Unix atomic actions, or how to replace an executable or library

Posted Sep 12, 2021 18:51 UTC (Sun) by nix (subscriber, #2304) [Link]

> Just out of curiosity, does the Linux run-time linker decrement the i_writecount in the library inode? Solaris did...

No. I'm fairly sure there is no way to do so: there was, but that was MAP_DENYWRITE, which, well, see this article... glibc is still careful to mmap its libraries with MAP_DENYWRITE, which the kernel then ignores :(

The shrinking role of ETXTBSY

Posted Aug 19, 2021 18:32 UTC (Thu) by evgeny (subscriber, #774) [Link] (12 responses)

> Developers (those working in compiled languages, anyway) tend to learn early on to respond to "text file busy" errors by killing off the program they are debugging and rerunning make.

Weird. I've never encountered such an error, though certainly many times recompiled a program while it was running - either directly or under gdb. I feel like living in a parallel universe...

The shrinking role of ETXTBSY

Posted Aug 19, 2021 18:51 UTC (Thu) by Tomasu (guest, #39889) [Link] (5 responses)

I've seen it but not much lately. Maybe newer linkers do the move and replace trick these days? So you'd be most likely to see it if your build steps include a manual cp of the executable.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 19:10 UTC (Thu) by evgeny (subscriber, #774) [Link] (4 responses)

Ah, I see. Interesting, so indeed the linker (or is it make itself?) is smart to first remove the old file...

The shrinking role of ETXTBSY

Posted Aug 19, 2021 23:13 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (3 responses)

Make doesn't know anything about removing or copying files. It only knows how to:

0. Parse a Makefile and do file-level* dependency resolution based on the contents.
1. Check the modification times of the input and output files, and identify what needs to be rebuilt.
2. Run the commands you tell it to run in order to rebuild something.

All other aspects of the build process are the responsibility of the specific commands you tell Make to run. Technically, you don't even need to use it to compile software, if you have some other process that can automatically transform input files into output files in a similar fashion to compilation.

* i.e. it has no integration with package managers etc., and instead it just knows things like "file X.o 'depends on' file X.c, because you told me that *.o files are made from *.c files of the same name."

The shrinking role of ETXTBSY

Posted Aug 20, 2021 6:26 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (2 responses)

> Make doesn't know anything about removing or copying files.

That isn't strictly true, at least with respect to GNU make. There are a number of built-in patterns. For example, in a directory containing only a source file "test.c" (and no makefile) if you run "make 'libtest.a(test.o)'" the GNU make will invoke the following commands automatically:

cc -c -o test.o test.c
ar rv libtest.a test.o
rm test.o

So without being told it knows at least how to compile .c files into .o files, add .o files to a static library, and remove the temporary .o files afterward. There is also at least one built-in implicit rule (%.out: %) which involves copying files: if you run "make test.c.out" it will invoke "cp test.c test.c.out". And if you have a file "script.sh" and run "make script" it will copy "script.sh" to "script"—using cat(1) and redirection rather than cp(1)—and then mark the resulting file as executable.

Besides the built-in rules, GNU make also removes any intermediate files generated when multiple implicit rules are chained to build a target: <https://www.gnu.org/software/make/manual/html_node/Chaine...>.

You can list all the built-in rules available on your system with "make -f /dev/null -p". There is also a partial list in the manual: <https://www.gnu.org/software/make/manual/html_node/Catalo...>.

The shrinking role of ETXTBSY

Posted Aug 23, 2021 1:02 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

OK, I suppose I was a bit vague, so let me be more precise:

Make doesn't have any code which directly calls rename(2), unlink(2), etc. for the purposes of executing a recipe.* Recipes are (effectively) very small shell scripts, and it is rm(1) or cp(1) (or in the case of open(2) for a redirection, the shell) which actually makes those syscalls on make's behalf. Make doesn't "know" anything about how to copy a file, it just "knows" to run cp(1) and check the exit code.

* I have not looked, but I suppose it is theoretically possible that there is some code path where make creates a temporary file or something like that. That's not what I'm talking about here. I'm talking about the files that the user cares about, i.e. the ones that actually appear on the command line or in a recipe, whether explicit or implicit.

The shrinking role of ETXTBSY

Posted Aug 23, 2021 14:11 UTC (Mon) by skitt (subscriber, #5367) [Link]

This isn’t specific to GNU Make, it’s been a feature of Make ever since Stuart Feldman’s first version (see http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3...) and is part of POSIX (http://pubs.opengroup.org/onlinepubs/9699919799/utilities...).

The shrinking role of ETXTBSY

Posted Aug 19, 2021 19:03 UTC (Thu) by clugstj (subscriber, #4020) [Link] (1 responses)

Well, if you build on one machine and test on another (which you should do - the code working on the development machine isn't good testing procedure), you'll see this when you "scp" to the test machine.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 19:28 UTC (Thu) by evgeny (subscriber, #774) [Link]

I do frequently test on a different machine. But the catch is I use a shared NFS partition, which eliminates the need to (s)cp and hence I've never seen this error.

The shrinking role of ETXTBSY

Posted Aug 28, 2021 18:38 UTC (Sat) by gnu_lorien (subscriber, #44036) [Link] (3 responses)

"I feel like living in a parallel universe..."

I had the same feeling when reading this article.

I had an experience in a university class around 2006 where the professor claimed that you couldn't overwrite in-use files. I used Gentoo at the time and had never needed to shut down anything to recompile and update my system. The professor insisted this could happen but I couldn't find any scenario from the userspace tools I had available where I couldn't update in-use files.

I've also worked at companies that do cross-platform work for at least 15 years. It's a regular occurrence that things break for the Windows toolchain that don't break for GNU/Linux toolchain because Linux toolchains will happily update in-use binaries. It's one of the things I consider a huge misfeature of Windows that you have to go hunting down whoever is locking a file in order to finish a build. From a user perspective I would be shocked and annoyed if I started seeing ETXTBSY show up. I say good riddance.

The shrinking role of ETXTBSY

Posted Aug 30, 2021 21:20 UTC (Mon) by dancol (guest, #142293) [Link] (2 responses)

No, the Windows way to do things is the right one. You shouldn't be able to mess up a running program by mucking with its binary from underneath. On Windows, you can *move* the old file out of the way and then create a new file, which is what you should be doing on a Unix system too. Windows just enforces good hygiene.

The shrinking role of ETXTBSY

Posted Aug 31, 2021 4:33 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

You can rewrite the running file on Windows. You can't truncate or delete it, though.

This has more to do with the way the memory mapping is managed on Windows than with anything else.

The shrinking role of ETXTBSY

Posted Aug 31, 2021 12:59 UTC (Tue) by Wol (subscriber, #4433) [Link]

Pr1mos added the ability to rename an open file.

Dunno exactly how it worked, but it was something like the - if the linker hit an open file - it knew the magic sauce to rename it to move it out the way, then write a new one.

Same idea as *nix's ability to delete an open file then put a new one in its place - anything which has the old file open will continue to use that until it closes it.

Cheers,
Wol

The shrinking role of ETXTBSY

Posted Aug 19, 2021 19:07 UTC (Thu) by clugstj (subscriber, #4020) [Link] (1 responses)

With all the security paranoia of late, I'd expected that everyone would be clamoring to have the shared libraries locked while they are being executed.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 20:15 UTC (Thu) by iabervon (subscriber, #722) [Link]

Programs that can write to a shared library can generally already do worse (or just replace it with one that will compromise new runs of setuid programs), whereas programs that are securely (or just robustly) written and intend to update shared libraries replace the inode.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 19:22 UTC (Thu) by walters (subscriber, #7396) [Link]

What's truly fun here is that none of this applies to interpreters; in particular, because e.g. `bash` actually interprets the file as it's reading it, if you do "truncate and overwrite in place" as e.g. `cp` does, you can get *totally arbitrary* behavior: https://thomask.sdf.org/blog/2019/11/09/take-care-editing...

I tried hard to argue for O_OBJECT a while ago: https://marc.info/?l=linux-fsdevel&m=139963046823575&... I still think it makes sense.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 20:47 UTC (Thu) by willy (subscriber, #9762) [Link] (9 responses)

> For extra fun, the changed code will only be read if it is faulted into RAM, meaning that said unhappy outcomes might not happen until hours (or days) after the file has been overwritten.

Well, hmm, no?

Assuming we're on a local filesystem (ie not NFSv3 or something), the write() goes directly into the page cache. Even if the application has used MAP_PRIVATE, that covers how to handle a store from the mmapper, not a write() from somebody else.

So the code does change under you. Now, I don't think we necessarily flush the CPU instruction cache at that point, so you might continue to execute some old instructions for a while, but at some point the CPU is going to notice that the i$ is out of date.

Unless you O_TRUNC, of course. Then, umm ... we get rid of all those pages immediately and your program segfaults straight away.

The shrinking role of ETXTBSY

Posted Aug 19, 2021 23:20 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (3 responses)

> Unless you O_TRUNC, of course. Then, umm ... we get rid of all those pages immediately and your program segfaults straight away.

You mean to tell me that I can instantly segfault an entire system by just running sudo truncate libc.so?

I mean, I'm not that surprised, there are loads of ways a malicious or stupid root can break the system. I'm just impressed by the "segfault every userspace process at the same time" angle.

The shrinking role of ETXTBSY

Posted Aug 20, 2021 0:33 UTC (Fri) by Karellen (subscriber, #67644) [Link] (1 responses)

Well, you can instantly kill every userspace process and panic the kernel at the same time on a system with sudo kill -9 -1 1

* Probably. I have not just tried this.

The shrinking role of ETXTBSY

Posted Aug 23, 2021 12:17 UTC (Mon) by anselm (subscriber, #2796) [Link]

You can't kill -9 1; the init process is protected from signals for which it doesn't have an explicitly installed signal handler.

The shrinking role of ETXTBSY

Posted Aug 20, 2021 8:32 UTC (Fri) by taladar (subscriber, #68407) [Link]

Technically statically compiled programs and those written in languages that do not use libc (e.g. Go) would not be affected.

The shrinking role of ETXTBSY

Posted Aug 22, 2021 22:02 UTC (Sun) by Paf (subscriber, #91811) [Link] (1 responses)

I think you’re missing a step here.

So, the interpreter doesn’t read in a copy or anything? Unless it’s mmaping it’s going to have a copy of at least part of the file. That’s how read() works.

User space doesn’t work from the page cache unless it’s mmaping.

The shrinking role of ETXTBSY

Posted Aug 22, 2021 22:49 UTC (Sun) by willy (subscriber, #9762) [Link]

Run `cat /proc/self/maps`.

My system shows the 'cat' binary mapped five times. One is executable.

But thanks for explaining to me how the read() system call and the page cache works.

The shrinking role of ETXTBSY

Posted Aug 23, 2021 9:06 UTC (Mon) by anton (subscriber, #25547) [Link] (2 responses)

Even if the application has used MAP_PRIVATE, that covers how to handle a store from the mmapper, not a write() from somebody else.
That's disturbing. Still, if a MAP_PRIVATE page is written to (e.g., with its original content) in one place, it is copied on that write, and later changes to the original don't affect it.

So that is what could be done on writing to an executed text file: in every affected process, make private copies of the pages of the whole original text (as if on copy-on-write) and populate the mapping with them. There is an opportunity for sharing between several processes that run the same changed binary, but I guess that the benefit is too small and too rare to make that effort. OTOH, the benefit of not having ETXTBSY and not having processes crash when their binary changes is more substantial IMO.

Actually, I would like that also for interpreters (a have had a number of shell scripts crash when I edited them while they were running), maybe by making MAP_PRIVATE|MAP_POPULATE behave that way, or with an additional flag to mmap().

The shrinking role of ETXTBSY

Posted Aug 23, 2021 20:18 UTC (Mon) by nybble41 (subscriber, #55106) [Link] (1 responses)

What would be the benefit of doing that with mmap(MAP_PRIVATE|MAP_POPULATE) vs. just reading the entire file into the process's anonymous private memory? Of course even if you do that the file could change while you're reading it, so it would reduce the window where the data could be corrupted but not eliminate it altogether.

For that matter, any process that could rewrite or truncate a file while it's in use could also corrupt the data beforehand. ETXTBUSY only protects against *accidentally* corrupting a file by updating it while it's in use, by forcing the update to fail. However, since we don't want the update to fail anyway, the solution which doesn't risk data corruption *or* an ETXTBUSY error is to write the new data to a temporary file and rename it over the original. This does require write access to the parent directory, but that doesn't seem unreasonable to me since logically you are modifying the directory to point to a new file. Any attempt to atomically update the content without replacing the file will run into the issue that mapping follow the file, not the content.

The shrinking role of ETXTBSY

Posted Aug 23, 2021 21:28 UTC (Mon) by anton (subscriber, #25547) [Link]

What would be the benefit of doing that with mmap(MAP_PRIVATE|MAP_POPULATE) vs. just reading the entire file into the process's anonymous private memory?
Zero-copy unless someone tries to write to the file. The way I imagine it, the write would block until the copying is completed, so this race condition would not exist.

The approach to write new file and rename over the old one would a good one, but despite the inconvenience of ETXTBSY linkers don't use this approach, so maybe the problem with the unwritable directories is more relevant than we think.

The shrinking role of ETXTBSY

Posted Aug 20, 2021 9:31 UTC (Fri) by pabs (subscriber, #43278) [Link] (9 responses)

Instead of blocking writes, just before the first write is detected, could Linux load a snapshot of the entire file into RAM and adjust the process structures to use the RAM instead of the file on disk?

The shrinking role of ETXTBSY

Posted Aug 20, 2021 14:08 UTC (Fri) by jreiser (subscriber, #11027) [Link] (7 responses)

An executable main program or one shared library can exceed the total size of RAM plus swap space (thus tmpfs). This occurs frequently on "default" VMs that have only 2GB of RAM and no swap space.

The shrinking role of ETXTBSY

Posted Aug 20, 2021 23:39 UTC (Fri) by pabs (subscriber, #43278) [Link]

I guess during normal use only a fraction of the executable/libraries are loaded into RAM, but it surprises me that a single program could have more than 2GB of storage backing it, most of the graphical ones on my system don't seem to be more than 500MB according to:

du -L -c $(ldd `which foo` | sed 's/.*=>//;s/ (.*//' | grep -v linux-vdso.so.1)

The shrinking role of ETXTBSY

Posted Aug 21, 2021 0:02 UTC (Sat) by pabs (subscriber, #43278) [Link] (2 responses)

Perhaps instead of loading into RAM, Linux could just do the atomic rename dance behind the back of the process.

The shrinking role of ETXTBSY

Posted Aug 21, 2021 18:14 UTC (Sat) by developer122 (guest, #152928) [Link] (1 responses)

This seems like a much saner thing to do and I'm surprised nobody figured it out in the 80's.

The shrinking role of ETXTBSY

Posted Aug 22, 2021 2:00 UTC (Sun) by pabs (subscriber, #43278) [Link]

Based on some of the discussion early in this thread, it sounds like doing that would be very complicated or impossible.

The shrinking role of ETXTBSY

Posted Aug 23, 2021 8:47 UTC (Mon) by anton (subscriber, #25547) [Link] (2 responses)

On my Debian system, all binaries in /usr/bin combined have a total text size of 285MB (as reported by size -t) and all libraries in /usr/lib combined have a text size of 295MB, so keeping the whole text of a few binaries or libraries is unlikely to lead to problems that the system would not soon have otherwise. Assuming that you have the ETXTBSY problem on such default VMs at all (what are you using these VMs for where you get ETXTBSY?), I still think that most users prefer the remote change to run out of memory to a certain ETXTBSY or, as seems to be in the works, a crash from having the binary changed during execution.

The shrinking role of ETXTBSY

Posted Aug 23, 2021 11:06 UTC (Mon) by excors (subscriber, #95769) [Link] (1 responses)

> On my Debian system, all binaries in /usr/bin combined have a total text size of 285MB (as reported by size -t) and all libraries in /usr/lib combined have a text size of 295MB, so keeping the whole text of a few binaries or libraries is unlikely to lead to problems that the system would not soon have otherwise.

Sometimes people run code that isn't shipped with Debian. Back in 2012, Facebook's page requests were handled by a single 1.5GB executable, generated from PHP code transpiled to C++. (https://arstechnica.com/information-technology/2012/04/ex...). I think they switched to a PHP JIT shortly after that, but I imagine other people had (or still have) even larger executables for similar reasons.

The shrinking role of ETXTBSY

Posted Aug 24, 2021 15:48 UTC (Tue) by anton (subscriber, #25547) [Link]

So someone who uses such a binary, runs it in a 2GB VM, and tries to overwrite this binary gets an out-of-memory condition in a VM rather than ETXTBSY or a crashing binary. So even this unlikely scenario is not really worse off, and in more usual scenarios things just work as intended.

The shrinking role of ETXTBSY

Posted Aug 21, 2021 7:21 UTC (Sat) by Homer512 (subscriber, #85295) [Link]

I guess a copy-on-write filesystem could support a version of memory-mapping that keeps the old content around while it is mapped, even if it's not yet in the page-cache.


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds