The shrinking role of ETXTBSY
The "text" that is busy in this case refers to a program's executable code — it's text that is read by the CPU rather than by humans. When a program is run, its executable text is mapped into the running process's address space. When this happens, Unix systems have traditionally prevented the file containing that text from being modified; the alternative is to allow the code being run to be changed arbitrarily, which rarely leads to happy outcomes. For extra fun, the changed code will only be read if it is faulted into RAM, meaning that said unhappy outcomes might not happen until hours (or days) after the file has been overwritten. Rather than repeatedly explain to users why their programs have crashed in mysterious ways, Unix kernel developers chose many years ago to freeze the underlying file while those programs run — leading to the need to explain ETXTBSY errors instead.
Perhaps the easiest way to generate such an error is to try to rebuild a program while some process is still running it. Developers (those working in compiled languages, anyway) tend to learn early on to respond to "text file busy" errors by killing off the program they are debugging and rerunning make.
How it works
Deep within the kernel, the inode structure is used to represent files; one field within that structure is an atomic_t called i_writecount. Normally, this field can be thought of as a count of the number of times that the file is held open for writing. If, however, i_writecount is less than zero, it is interpreted instead as a count of the number of times that the writing of this file is being blocked. If the file is an executable file, then each process that runs it will decrement i_writecount for the duration of that execution. This field thus functions as a sort of simple lock. If its value is negative, the file cannot be opened for write access; if, instead, its value is positive, attempts to block write access will fail. (Similarly, an attempt to execute a file that is currently open for writing will fail with ETXTBSY).
In current kernels, it is possible to attempt to block write access with a call to deny_write_access(), but the more common way is to create a memory mapping with the VM_DENYWRITE flag set. So, for example, the execve() system call will map the code sections of the executable file into memory with VM_DENYWRITE; that mapping causes i_writecount to be decremented (this will fail if the file is open for writing, of course). When the mapping goes away (the running program exits or calls execve()), i_writecount will be incremented again; if it reaches zero, the file will once again become writable.
Back in the early days of Linux, prior to the Git era, the mmap() system call supported a flag called MAP_DENYWRITE that would cause VM_DENYWRITE to be set within the kernel and thus block write access to the mapped file for the duration of the mapping. There was a problem with this option, though: any process that could open a file for read access could map it with MAP_DENYWRITE and prevent any other process on the system from writing that file. That is, at best, an invitation to denial-of-service attacks, so it was removed long ago. Calls to mmap() with that flag set will succeed, but the flag is simply ignored.
Shared libraries
The removal of MAP_DENYWRITE had an interesting, if obscure, side effect. One may think of a file, such as /usr/bin/cat, as containing an executable program. In truth, though, much of the code that will be executed when somebody runs cat is not found in that file; instead, it is in a vast number of shared libraries. Those files contain executable code just like the nominal executable file, so one would think that they, too, would be protected from writing while in use.
Once upon a time, that was indeed the case; the ancient uselib() system call will map libraries with writing blocked. It may well be, though, that there are no systems still using uselib(); instead, on current systems, shared libraries are mapped from user space with mmap(). The MAP_DENYWRITE flag was created for just this use case, so that shared libraries could not be written while in use. When MAP_DENYWRITE went away, so did that protection; current Linux systems will happily allow a suitably privileged user to overwrite in-use, shared libraries.
The end result of this history is that the memory-management subsystem has a bunch of leftover code, in the form of the support for MAP_DENYWRITE and VM_DENYWRITE, that no longer has any real purpose. So David Hildenbrand decided to take it out. With this patch set installed, execve() will simply call deny_write_access() directly, and mmap() no longer has to consider that case at all. This results in a user-space API change: uselib() no longer blocks write access to shared libraries. Nobody expects anybody to notice.
An idea whose time has passed?
In response to Hildenbrand's patch set, GNU C library developer Florian
Weimer pointed
out that the library has "a persistent issue with people using cp
(or similar tools) to replace system libraries
". He did not say
that library developers have long since tired of explaining to those users
why their applications crashed in mysterious ways, but there was no need
to. It would be nice, he said, to provide a way to prevent this sort of
error or, at least, a way to deterministically tell that a crash was caused
by an overwritten library. There are a number of ways that could be
established without bringing back MAP_DENYWRITE, he said.
The discussion wandered into other ways to protect shared libraries from being overwritten while in use; Eric Biederman suggested installing them with the immutable bit set, for example. But Linus Torvalds made it clear that he thought the problem was elsewhere:
The kernel ETXTBUSY thing is purely a courtesy feature, and as people have noticed it only really works for the main executable because of various reasons. It's not something user space should even rely on, it's more of a "ok, you're doing something incredibly stupid, and we'll help you avoid shooting yourself in the foot when we notice".
After Torvalds repeated that point a couple of times, Andy Lutomirski suggested just removing the write-blocking mechanism altogether:
It’s at best erratic — it only applies for static binaries, and it has never once saved me from a problem I care about. If the program I’m recompiling crashes, I don’t care — it’s probably already part way through dying from an unrelated fatal signal. What actually happens is that I see -ETXTBUSY, think “wait, this isn’t Windows, why are there file sharing rules,” then think “wait, Linux has *one* half baked file sharing rule,” and go on with my life.
Torvalds was
amenable to the idea, though he worried that some application somewhere
might depend on the ETXTBSY behavior. But he noted that it has been
steadily weakened over time, and nobody has complained so far. Removing it
could be tried, he continued:
"Worst comes to worst, we'll have to put it back, but at least we'd
know what crazy thing still wants it
".
Al Viro worried, though, that some installation scripts might depend on this behavior; Christian Brauner added that allowing busy executable files to be written could make some security exploits easier. Hildenbrand said that his patch set already makes the write-blocking behavior much simpler, and that he would be in favor of leaving it in place for now. The second version of the patch set, posted on August 16, retains the ETXTBSY behavior for the main executable file.
Hildenbrand's simplification work seems sure to land during the 5.15 merge window; whether ETXTBSY will disappear entirely is rather less certain. Getting rid of it strikes some developers as a nice cleanup, but there is nothing forcing that removal to happen at this time. Meanwhile, the potential for user-space regressions always exists when behavior is changed in this way. The safe approach is thus to leave ETXTBSY in place for now.
[Postscript: Lutomirski pointed to mandatory locks as the one other
place in the kernel that implements unwelcome file-sharing rules. That
feature is indeed unpopular; the kernel
document on mandatory locks starts with a section on why they should
not be used. In 2015, a configuration
option was added to make mandatory locks optional, and some
distributors have duly disabled them. One potential outcome of the
ETXTBSY discussion looks likely to be an effort to get
other distributors
to do the same until it becomes clear that mandatory locks can safely be
removed. Stay tuned.]
| Index entries for this article | |
|---|---|
| Kernel | System calls/mmap() |
Posted Aug 19, 2021 15:49 UTC (Thu)
by Sesse (subscriber, #53779)
[Link] (21 responses)
Posted Aug 19, 2021 16:05 UTC (Thu)
by chris_se (subscriber, #99706)
[Link] (13 responses)
It would be great if the kernel had an open flag that could be combined with O_TRUNC that atomically replaces the file (assuming you would be able to open it without that flag), but keeps the old file's permissions / creation date / etc. -- and if the old file was still open it would be treated as if the old file had been deleted. (Similar in effect to how the create new file + atomic rename would work.) cp could then use that flag, same thing with compilers/linkers, which would also avoid the issues described in the article. And on platforms / older kernels that don't provide the flag the same tools could behave the same as before, as if the flag hadn't been set.
Posted Aug 19, 2021 18:40 UTC (Thu)
by ibukanov (subscriber, #3942)
[Link] (4 responses)
Posted Aug 29, 2021 23:55 UTC (Sun)
by skissane (subscriber, #38675)
[Link] (3 responses)
More generally, one could use this to enable what I've heard some people call "unit files". (I don't know if that is standard terminology at all, I heard it from some commenter on HN.) Basically, a unit file acts atomically – when you open it, you get a read snapshot of the file at the time you opened it. When you write to it, your changes aren't visible to other readers until you close the file (or call some kind of "commit" system call). (This would likely imply only one process can have the file open for writing at a time). Many apps would really benefit from these kind of transactional semantics, right now they have to use something like SQLite to get them, but with unit files they could get those semantics directly from the OS.
Of course, I doubt Linux is going to any such features. It is complicated, probably wouldn't be used that much, doesn't fit well with POSIX, could not be supported by existing filesystems without substantial changes (although a few do have some existing COW support, and they might be able to leverage that to implement this feature with only modest effort). But one can daydream about a parallel universe in which UNIX never really took off, and we are all using some open-source clone of VMS. In some ways no doubt an uglier universe, but in other ways a prettier one.
Posted Aug 31, 2021 13:20 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (2 responses)
Again harping on Pr1mos, and I don't know how easy it would be to retrofit to Linux, but files had reader/writer access controls. I don't know whether it was set by the first application to open it, or more likely set in the file system, but you had a choice of "multiple readers (or one writer)", "multiple readers and one writer", and "multiple readers and multiple writers".
So, as it implies, the first person to open the file got it, any other attempts were checked against this lock and the open succeeded or failed depending. When I wrote an accounts package, everything was configured NR&1W, all files were opened read-only unless actually updating the data, and all files were opened in a defined order to prevent deadlocks. (Still didn't prevent one user changing the data underneath another, but the program made sure this didn't matter...)
Cheers,
Posted Sep 16, 2024 17:38 UTC (Mon)
by jch (guest, #51929)
[Link] (1 responses)
Aren't those just mandatory file locks taken at open time?
Posted Sep 16, 2024 19:12 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
So my accounts system had "NR&1W" set on all files, and only ever opened a file to write when it was doing a commit. There was also always an explicit open hierarchy (as in I only ever opened individual clients after opening the client summary, so I couldn't get a deadlock, same for other ledgers).
So it relied on programming discipline, but could be proven to work if the rules were followed.
Cheers,
Posted Aug 19, 2021 22:21 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link] (6 responses)
I would think that this would require assigning a new inode number to keep the content separate. Even in your description you refer to "old file" and "new file", not "old content" and "new content". If you change the inode number, however, then you need to update the directory entry, which involves writing to the directory, so we're back to the original permission issue.
Posted Aug 19, 2021 23:15 UTC (Thu)
by SLi (subscriber, #53131)
[Link] (4 responses)
Posted Aug 20, 2021 5:57 UTC (Fri)
by nybble41 (subscriber, #55106)
[Link] (2 responses)
Atomically swapping the *content* of two files (one of which could be temporary/unlinked) could be a useful operation for some cases which currently rely on atomic rename, and wouldn't have any issue with read-only directories, but it wouldn't help in this particular situation since any existing open file descriptions or mapped memory would immediately see the new content just as if the file had been modified with ftruncate(2) and write(2).
Posted Aug 20, 2021 17:47 UTC (Fri)
by smurf (subscriber, #17840)
[Link] (1 responses)
This will cause all kinds of interesting race conditions unless handled carefully … but it could work.
Posted Aug 22, 2021 8:35 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link]
(Because, well, what's the alternative? Any global mutable state will inevitably race, at some level of abstraction, and global mutable state is literally the whole point of a filesystem.)
Posted Aug 20, 2021 6:42 UTC (Fri)
by pbonzini (subscriber, #60935)
[Link]
Posted Aug 20, 2021 10:55 UTC (Fri)
by runekock (subscriber, #50229)
[Link]
Posted Aug 24, 2021 19:21 UTC (Tue)
by flussence (guest, #85566)
[Link]
Admittedly it doesn't sound all that easy to implement something like this, even to a kernel layperson.
Posted Aug 19, 2021 16:37 UTC (Thu)
by mbunkus (subscriber, #87248)
[Link] (4 responses)
Posted Aug 19, 2021 16:41 UTC (Thu)
by Sesse (subscriber, #53779)
[Link] (3 responses)
Posted Aug 19, 2021 17:38 UTC (Thu)
by mbunkus (subscriber, #87248)
[Link] (2 responses)
I'm talking about new_file being smaller than (old_file + free_before_copying) but being larger than free_before_copying. If you copy-to-temporary with atomic-rename you'll run out of space whereas directly opening the target file for writing will work.
Take this dummy 1MB file system with one 800KB file occupying it, trying to overwrite the 800KB file with a 400KB file:
[0 root@sweet-chili /home/mosu/tmp/mp] df .
rsync does the copy-to-temporary with atomic-rename, but it has always done so & is therefore fine. Of course it does have options to deal with low-space file systems the user can turn on, notably "--delete-before".
For cp it would be a huge change in semantics.
Posted Aug 20, 2021 17:49 UTC (Fri)
by smurf (subscriber, #17840)
[Link] (1 responses)
Posted Aug 20, 2021 18:19 UTC (Fri)
by mbunkus (subscriber, #87248)
[Link]
My main point was that such a change would change cp's semantics. I shouldn't have brought rsync up.
Posted Aug 21, 2021 7:15 UTC (Sat)
by Homer512 (subscriber, #85295)
[Link]
Posted Aug 30, 2021 16:03 UTC (Mon)
by nivedita76 (subscriber, #121790)
[Link]
What’s the use case for doing cp overwriting an existing executable or shared library?
Posted Aug 19, 2021 16:34 UTC (Thu)
by david.a.wheeler (subscriber, #72896)
[Link] (11 responses)
Clearly there's value in making the implementation code simpler (easier to maintain, more likely to be correct, etc.).
However, while it may be "purely a courtesy feature", courtesy is nice. Humans make an incredible number of mistakes; detecting & countering some ongoing footguns *can* be helpful, even if the detection only works in some cases. Especially since it's been that way "from the beginning"; applications might depends on it. It's not a security mechanism, so it doesn't need to counter "all cases" to be useful. I think this is best left in.
Posted Aug 21, 2021 23:28 UTC (Sat)
by neilbrown (subscriber, #359)
[Link] (9 responses)
I tend to agree. Further, I think it would be good to make it even more useful.
With a bit of work, we could make O_DENYWRITE | O_RDWR work so that the opener can write but nothing else can.
Obviously O_DENYWRITE would require write-permission to the file, thus avoiding the DOS problems of MAY_DENYWRITE.
Posted Aug 22, 2021 21:53 UTC (Sun)
by neilbrown (subscriber, #359)
[Link] (7 responses)
and obviously that would make it useless for shared-library loaders and script interpreters.
So I've completely changed my mind. Only permissions should be effective at preventing a process from writing to a file. A correctly written program should choose not to write to a file that is in-use.
Maybe shared-library loaders and script interpreters should take a LOCK_SH flock on the file, and 'cp' should replace-instead-of-overwrite if it cannot get a LOCK_EX flock.
Posted Aug 23, 2021 0:40 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (6 responses)
I do not want my tools doing two completely different things depending on some entirely unrelated state. A given cp invocation should always overwrite, or it should always replace. It must not inspect the phase of the moon to decide which code path to use. That's just an outage waiting to happen. If you want it to support both use cases, add a flag and require the user to be explicit.
Posted Aug 23, 2021 1:03 UTC (Mon)
by neilbrown (subscriber, #359)
[Link] (4 responses)
And yet .... they already do.
"mv" will prefer the rename() systemcall, but if that returns EXDEV, it will fall back to copy-and-remove-original.
But maybe EXDEV not as "unrelated" as a flock?
Posted Aug 23, 2021 1:26 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (1 responses)
If the target is a pre-existing directory, it copies into it. If the target is a pre-existing file it overwrites it. And if the target doesn't exist, it makes a copy called the target.
In other words, if you are copying directories in a script, you need a whole bunch of wrapper code if you want the results to be reproducible.
Cheers,
Posted Aug 23, 2021 5:01 UTC (Mon)
by mchapman (subscriber, #66589)
[Link]
In other words, if you are copying directories in a script, you need a whole bunch of wrapper code if you want the results to be reproducible.
You can use
With GNU
Posted Aug 23, 2021 15:40 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
I find it baffling that you seem to tacitly assume the existing design is perfect. It's obviously not. The -T flag, for example, only needs to exist because cp and mv "helpfully" default to interpreting the last argument as -t if it is possible to do so. If you had to explicitly pass -t, then -T would not need to exist, and a whole class of bugs would have been eliminated.
Posted Aug 23, 2021 22:19 UTC (Mon)
by neilbrown (subscriber, #359)
[Link]
I did not mean to imply that, and am sorry that it came across that way.
I certainly see the attraction of having simple tools with simple semantics - "Do one thing and do it well". However I know from experience that such tools are usually *too* simple. Different people have different opinions about what the "one thing" should be, and about what it means to do it "well".
There is, and should be, a tension between adding functionality and maintaining simplicity. Both have value and there is no perfect balance. Rather we iteratively find a dynamic balance through robust "conversation" (which involves sharing both opinions and code).
I didn't imagine my proposal about how 'cp' could handle locking would be the final word on the subject, just another contribution to an ongoing conversation. Thanks for adding to the conversation!
Posted Aug 24, 2021 16:23 UTC (Tue)
by immibis (subscriber, #105511)
[Link]
Of course, that's probably impossible in current Linux.
Posted Sep 12, 2021 18:45 UTC (Sun)
by nix (subscriber, #2304)
[Link]
What if we arranged for ETXTBSY to be emitted by default for files which have pages mapped executable in some process's address space? (The flag would obviously be flipped by the first mmap(MAP_EXEC) of that file, and flipped off by munmap). This is, obviously, a change in semantics, but surely not even jitters rely on writing to executable file-backed pages (differing cache coherency semantics make this operation intrinsically nonportable in any case).
This prevents the "easy DoS" of open-with-a-flag-to-deny-overwriting -- the only DoS vector now available is the one we already have, "execute something and nobody else can overwrite it while you're executing from it", which seems entirely, y'know, desirable.
Posted Aug 22, 2021 8:42 UTC (Sun)
by niner (subscriber, #26151)
[Link]
Posted Aug 19, 2021 16:44 UTC (Thu)
by mb (subscriber, #50428)
[Link] (11 responses)
I would have expected them to use some kind of atomic-rename to avoid in-between states. As far as I understand it, this would avoid ETXTBSY (and also be crash-safe).
And what about install (1)?
Posted Aug 19, 2021 20:56 UTC (Thu)
by andyc (subscriber, #1130)
[Link] (1 responses)
install(1) by default will do an unlink(2) then a copy. Thus the old executables inode remains in place until the program exits.
Posted Aug 20, 2021 0:41 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Posted Aug 19, 2021 23:50 UTC (Thu)
by smcv (subscriber, #53363)
[Link] (5 responses)
I think dpkg actually goes further than that, by hard-linking all the old content to a backup name (foo.dpkg-old), then renaming all the new files so foo.dpkg-new replaces foo, and finally deleting all the foo.dpkg-old - so that if it gets an error halfway through unpacking a package, it can use all the foo.dpkg-old files to roll back to the old version in a consistent state.
Posted Aug 20, 2021 6:04 UTC (Fri)
by mb (subscriber, #50428)
[Link]
Posted Aug 20, 2021 8:04 UTC (Fri)
by Sesse (subscriber, #53779)
[Link] (3 responses)
Posted Aug 20, 2021 17:52 UTC (Fri)
by smurf (subscriber, #17840)
[Link] (2 responses)
Yes this means that a power failure / kernel crash halfway through your update is likely to kill your system, but when that is sufficiently unlikely …
NB: WARNING: do not use this command for anything that triggers a reboot. Ever.
Posted Aug 21, 2021 11:51 UTC (Sat)
by Jonno (subscriber, #49613)
[Link]
Or you could use `dpkg --force-unsafe-io ...` / `apt --option DPkg::options::=--force-unsafe-io ...`.
That will only remove the fsync before renaming file X.dpkg-new to X, which gives you about 99% of the speed boost from eatmydata, without risking the apt and dpkg databases and logs.
(Meaning that if something does break you can inspect the logs and simply reinstall any packages that was unpacked since your last system-wide sync to fix your installation. If the broken package is essential it might be a bit tricky to do, but you can always boot from a different boot media, such as the Debian installer CD, and run `dpkg --instdir=/mnt/target ...` to get going again).
Posted Sep 16, 2024 9:15 UTC (Mon)
by MarcB (guest, #101804)
[Link]
Posted Aug 26, 2021 16:00 UTC (Thu)
by davecb (subscriber, #1574)
[Link] (2 responses)
If you think the latter sounds like the read-copy-update Linux uses inside the kernel, you'd be right: it comes from the same seminal papers.
The latter is the canonical way to update a file or library: I used it heavily on Unix and Solaris (I was part of the shared library team), update apparently uses it and I considered it one of the really impressive things about Unix-derived systems. And it fits in nicely as part of the "everything is a file" metaphor.
--dave
Posted Aug 26, 2021 16:05 UTC (Thu)
by davecb (subscriber, #1574)
[Link] (1 responses)
If so, that suggests that Linux could have the same "you're about to mess up one of your processes" warning if someone inadvertently attempted to use cp or rsync to update an in-use library, just like an in-use program.
Posted Sep 12, 2021 18:51 UTC (Sun)
by nix (subscriber, #2304)
[Link]
No. I'm fairly sure there is no way to do so: there was, but that was MAP_DENYWRITE, which, well, see this article... glibc is still careful to mmap its libraries with MAP_DENYWRITE, which the kernel then ignores :(
Posted Aug 19, 2021 18:32 UTC (Thu)
by evgeny (subscriber, #774)
[Link] (12 responses)
Weird. I've never encountered such an error, though certainly many times recompiled a program while it was running - either directly or under gdb. I feel like living in a parallel universe...
Posted Aug 19, 2021 18:51 UTC (Thu)
by Tomasu (guest, #39889)
[Link] (5 responses)
Posted Aug 19, 2021 19:10 UTC (Thu)
by evgeny (subscriber, #774)
[Link] (4 responses)
Posted Aug 19, 2021 23:13 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
0. Parse a Makefile and do file-level* dependency resolution based on the contents.
All other aspects of the build process are the responsibility of the specific commands you tell Make to run. Technically, you don't even need to use it to compile software, if you have some other process that can automatically transform input files into output files in a similar fashion to compilation.
* i.e. it has no integration with package managers etc., and instead it just knows things like "file X.o 'depends on' file X.c, because you told me that *.o files are made from *.c files of the same name."
Posted Aug 20, 2021 6:26 UTC (Fri)
by nybble41 (subscriber, #55106)
[Link] (2 responses)
That isn't strictly true, at least with respect to GNU make. There are a number of built-in patterns. For example, in a directory containing only a source file "test.c" (and no makefile) if you run "make 'libtest.a(test.o)'" the GNU make will invoke the following commands automatically:
cc -c -o test.o test.c
So without being told it knows at least how to compile .c files into .o files, add .o files to a static library, and remove the temporary .o files afterward. There is also at least one built-in implicit rule (%.out: %) which involves copying files: if you run "make test.c.out" it will invoke "cp test.c test.c.out". And if you have a file "script.sh" and run "make script" it will copy "script.sh" to "script"—using cat(1) and redirection rather than cp(1)—and then mark the resulting file as executable.
Besides the built-in rules, GNU make also removes any intermediate files generated when multiple implicit rules are chained to build a target: <https://www.gnu.org/software/make/manual/html_node/Chaine...>.
You can list all the built-in rules available on your system with "make -f /dev/null -p". There is also a partial list in the manual: <https://www.gnu.org/software/make/manual/html_node/Catalo...>.
Posted Aug 23, 2021 1:02 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
Make doesn't have any code which directly calls rename(2), unlink(2), etc. for the purposes of executing a recipe.* Recipes are (effectively) very small shell scripts, and it is rm(1) or cp(1) (or in the case of open(2) for a redirection, the shell) which actually makes those syscalls on make's behalf. Make doesn't "know" anything about how to copy a file, it just "knows" to run cp(1) and check the exit code.
* I have not looked, but I suppose it is theoretically possible that there is some code path where make creates a temporary file or something like that. That's not what I'm talking about here. I'm talking about the files that the user cares about, i.e. the ones that actually appear on the command line or in a recipe, whether explicit or implicit.
Posted Aug 23, 2021 14:11 UTC (Mon)
by skitt (subscriber, #5367)
[Link]
Posted Aug 19, 2021 19:03 UTC (Thu)
by clugstj (subscriber, #4020)
[Link] (1 responses)
Posted Aug 19, 2021 19:28 UTC (Thu)
by evgeny (subscriber, #774)
[Link]
Posted Aug 28, 2021 18:38 UTC (Sat)
by gnu_lorien (subscriber, #44036)
[Link] (3 responses)
I had the same feeling when reading this article.
I had an experience in a university class around 2006 where the professor claimed that you couldn't overwrite in-use files. I used Gentoo at the time and had never needed to shut down anything to recompile and update my system. The professor insisted this could happen but I couldn't find any scenario from the userspace tools I had available where I couldn't update in-use files.
I've also worked at companies that do cross-platform work for at least 15 years. It's a regular occurrence that things break for the Windows toolchain that don't break for GNU/Linux toolchain because Linux toolchains will happily update in-use binaries. It's one of the things I consider a huge misfeature of Windows that you have to go hunting down whoever is locking a file in order to finish a build. From a user perspective I would be shocked and annoyed if I started seeing ETXTBSY show up. I say good riddance.
Posted Aug 30, 2021 21:20 UTC (Mon)
by dancol (guest, #142293)
[Link] (2 responses)
Posted Aug 31, 2021 4:33 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
This has more to do with the way the memory mapping is managed on Windows than with anything else.
Posted Aug 31, 2021 12:59 UTC (Tue)
by Wol (subscriber, #4433)
[Link]
Dunno exactly how it worked, but it was something like the - if the linker hit an open file - it knew the magic sauce to rename it to move it out the way, then write a new one.
Same idea as *nix's ability to delete an open file then put a new one in its place - anything which has the old file open will continue to use that until it closes it.
Cheers,
Posted Aug 19, 2021 19:07 UTC (Thu)
by clugstj (subscriber, #4020)
[Link] (1 responses)
Posted Aug 19, 2021 20:15 UTC (Thu)
by iabervon (subscriber, #722)
[Link]
Posted Aug 19, 2021 19:22 UTC (Thu)
by walters (subscriber, #7396)
[Link]
I tried hard to argue for O_OBJECT a while ago: https://marc.info/?l=linux-fsdevel&m=139963046823575&... I still think it makes sense.
Posted Aug 19, 2021 20:47 UTC (Thu)
by willy (subscriber, #9762)
[Link] (9 responses)
Well, hmm, no?
Assuming we're on a local filesystem (ie not NFSv3 or something), the write() goes directly into the page cache. Even if the application has used MAP_PRIVATE, that covers how to handle a store from the mmapper, not a write() from somebody else.
So the code does change under you. Now, I don't think we necessarily flush the CPU instruction cache at that point, so you might continue to execute some old instructions for a while, but at some point the CPU is going to notice that the i$ is out of date.
Unless you O_TRUNC, of course. Then, umm ... we get rid of all those pages immediately and your program segfaults straight away.
Posted Aug 19, 2021 23:20 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
You mean to tell me that I can instantly segfault an entire system by just running sudo truncate libc.so?
I mean, I'm not that surprised, there are loads of ways a malicious or stupid root can break the system. I'm just impressed by the "segfault every userspace process at the same time" angle.
Posted Aug 20, 2021 0:33 UTC (Fri)
by Karellen (subscriber, #67644)
[Link] (1 responses)
Well, you can instantly kill every userspace process and panic the kernel at the same time on a system with sudo kill -9 -1 1 * Probably. I have not just tried this.
Posted Aug 23, 2021 12:17 UTC (Mon)
by anselm (subscriber, #2796)
[Link]
You can't kill -9 1; the init process is protected from signals for which it doesn't have an explicitly installed signal handler.
Posted Aug 20, 2021 8:32 UTC (Fri)
by taladar (subscriber, #68407)
[Link]
Posted Aug 22, 2021 22:02 UTC (Sun)
by Paf (subscriber, #91811)
[Link] (1 responses)
So, the interpreter doesn’t read in a copy or anything? Unless it’s mmaping it’s going to have a copy of at least part of the file. That’s how read() works.
User space doesn’t work from the page cache unless it’s mmaping.
Posted Aug 22, 2021 22:49 UTC (Sun)
by willy (subscriber, #9762)
[Link]
My system shows the 'cat' binary mapped five times. One is executable.
But thanks for explaining to me how the read() system call and the page cache works.
Posted Aug 23, 2021 9:06 UTC (Mon)
by anton (subscriber, #25547)
[Link] (2 responses)
So that is what could be done on writing to an executed text file: in every affected process, make private copies of the pages of the whole original text (as if on copy-on-write) and populate the mapping with them. There is an opportunity for sharing between several processes that run the same changed binary, but I guess that the benefit is too small and too rare to make that effort. OTOH, the benefit of not having ETXTBSY and not having processes crash when their binary changes is more substantial IMO.
Actually, I would like that also for interpreters (a have had a number of shell scripts crash when I edited them while they were running), maybe by making MAP_PRIVATE|MAP_POPULATE behave that way, or with an additional flag to mmap().
Posted Aug 23, 2021 20:18 UTC (Mon)
by nybble41 (subscriber, #55106)
[Link] (1 responses)
For that matter, any process that could rewrite or truncate a file while it's in use could also corrupt the data beforehand. ETXTBUSY only protects against *accidentally* corrupting a file by updating it while it's in use, by forcing the update to fail. However, since we don't want the update to fail anyway, the solution which doesn't risk data corruption *or* an ETXTBUSY error is to write the new data to a temporary file and rename it over the original. This does require write access to the parent directory, but that doesn't seem unreasonable to me since logically you are modifying the directory to point to a new file. Any attempt to atomically update the content without replacing the file will run into the issue that mapping follow the file, not the content.
Posted Aug 23, 2021 21:28 UTC (Mon)
by anton (subscriber, #25547)
[Link]
The approach to write new file and rename over the old one would a good one, but despite the inconvenience of ETXTBSY linkers don't use this approach, so maybe the problem with the unwritable directories is more relevant than we think.
Posted Aug 20, 2021 9:31 UTC (Fri)
by pabs (subscriber, #43278)
[Link] (9 responses)
Posted Aug 20, 2021 14:08 UTC (Fri)
by jreiser (subscriber, #11027)
[Link] (7 responses)
Posted Aug 20, 2021 23:39 UTC (Fri)
by pabs (subscriber, #43278)
[Link]
du -L -c $(ldd `which foo` | sed 's/.*=>//;s/ (.*//' | grep -v linux-vdso.so.1)
Posted Aug 21, 2021 0:02 UTC (Sat)
by pabs (subscriber, #43278)
[Link] (2 responses)
Posted Aug 21, 2021 18:14 UTC (Sat)
by developer122 (guest, #152928)
[Link] (1 responses)
Posted Aug 22, 2021 2:00 UTC (Sun)
by pabs (subscriber, #43278)
[Link]
Posted Aug 23, 2021 8:47 UTC (Mon)
by anton (subscriber, #25547)
[Link] (2 responses)
Posted Aug 23, 2021 11:06 UTC (Mon)
by excors (subscriber, #95769)
[Link] (1 responses)
Sometimes people run code that isn't shipped with Debian. Back in 2012, Facebook's page requests were handled by a single 1.5GB executable, generated from PHP code transpiled to C++. (https://arstechnica.com/information-technology/2012/04/ex...). I think they switched to a PHP JIT shortly after that, but I imagine other people had (or still have) even larger executables for similar reasons.
Posted Aug 24, 2021 15:48 UTC (Tue)
by anton (subscriber, #25547)
[Link]
Posted Aug 21, 2021 7:21 UTC (Sat)
by Homer512 (subscriber, #85295)
[Link]
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
File version numbers à la OpenVMS?
File version numbers à la OpenVMS?
Wol
File version numbers à la OpenVMS?
File version numbers à la OpenVMS?
Wol
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop0 996 797 128 87% /home/mosu/tmp/mp
[0 root@sweet-chili /home/mosu/tmp/mp] ls -l
total 796
-rw-r--r-- 1 root root 800000 2021-08-19 19:30 800KB
[0 root@sweet-chili /home/mosu/tmp/mp] ls -l ../*KB
-rw-r--r-- 1 root root 400000 2021-08-19 19:30 ../400KB
-rw-r--r-- 1 root root 800000 2021-08-19 19:30 ../800KB
[0 root@sweet-chili /home/mosu/tmp/mp] cp ../400KB .
cp: error writing './400KB': No space left on device
[1 root@sweet-chili /home/mosu/tmp/mp] rm 400KB
[0 root@sweet-chili /home/mosu/tmp/mp] cp ../400KB 800KB
[0 root@sweet-chili /home/mosu/tmp/mp] ls -l
total 404
-rw-r--r-- 1 root root 400000 2021-08-19 19:33 800KB
[0 root@sweet-chili /home/mosu/tmp/mp]
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
Interesting discussion on ETXTBSY
Interesting discussion on ETXTBSY
Currently only executables benefit from ETXTBSY. What if we added an O_DENYWRITE open flag.
Then shared-library loaders could set this, as could script interpreters.
This would be great for swap files.
Interesting discussion on ETXTBSY
This would require a way to find out if a file is in use. If there are needs in this area, that is the direction that we should innovate.
Interesting discussion on ETXTBSY
Interesting discussion on ETXTBSY
Interesting discussion on ETXTBSY
Wol
Interesting discussion on ETXTBSY
cp -R source/. dest/, which will do the right thing whether or not dest is an existing directory.
cp, you can also do cp -R --no-target-directory source/ dest/.
Interesting discussion on ETXTBSY
Interesting discussion on ETXTBSY
What I was trying to highlight was that your stated desire about behaviour of tools was already thwarted by reality. The desire came across as a bit naive.
Interesting discussion on ETXTBSY
Interesting discussion on ETXTBSY
Interesting discussion on ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
Unix atomic actions, or how to replace an executable or library
* open had the O_CREAT flag, which would atomically create a file, or fail if it existed
* mv atomically replaced the inode under a filename, allowing new openers of the file to get the new version, and everyone with an open copy to continue using the old one.
Unix atomic actions, or how to replace an executable or library
Unix atomic actions, or how to replace an executable or library
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
1. Check the modification times of the input and output files, and identify what needs to be rebuilt.
2. Run the commands you tell it to run in order to rebuild something.
The shrinking role of ETXTBSY
ar rv libtest.a test.o
rm test.o
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
Wol
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
Even if the application has used MAP_PRIVATE, that covers how to handle a store from the mmapper, not a write() from somebody else.
That's disturbing. Still, if a MAP_PRIVATE page is written to (e.g., with its original content) in one place, it is copied on that write, and later changes to the original don't affect it.
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
What would be the benefit of doing that with mmap(MAP_PRIVATE|MAP_POPULATE) vs. just reading the entire file into the process's anonymous private memory?
Zero-copy unless someone tries to write to the file. The way I imagine it, the write would block until the copying is completed, so this race condition would not exist.
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
On my Debian system, all binaries in /usr/bin combined have a total text size of 285MB (as reported by size -t) and all libraries in /usr/lib combined have a text size of 295MB, so keeping the whole text of a few binaries or libraries is unlikely to lead to problems that the system would not soon have otherwise. Assuming that you have the ETXTBSY problem on such default VMs at all (what are you using these VMs for where you get ETXTBSY?), I still think that most users prefer the remote change to run out of memory to a certain ETXTBSY or, as seems to be in the works, a crash from having the binary changed during execution.
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
So someone who uses such a binary, runs it in a 2GB VM, and tries to overwrite this binary gets an out-of-memory condition in a VM rather than ETXTBSY or a crashing binary. So even this unlikely scenario is not really worse off, and in more usual scenarios things just work as intended.
The shrinking role of ETXTBSY
The shrinking role of ETXTBSY
