The 3.11 merge window closes
The 3.11 merge window closes
Posted Jul 17, 2013 9:27 UTC (Wed) by Karellen (subscriber, #67644)In reply to: The 3.11 merge window closes by epa
Parent article: The 3.11 merge window closes
You can do that already by creating a file with the "wrong" name (e.g. "config.cfg.new") and calling rename(2) when you're finished. e.g.
  rename("config.cfg.new", "config.cfg");
"instead open with O_TMPFILE and atomically link it with a certain name once you're finished."
Unfortunately, there currently is no way to create a new link to a file for which you only have a file handle.
There are some cases in which it wouldn't make sense, such as trying to link a socket fd into a filesystem, or creating a link in a mountpoint where the file does not reside, but a hypothetical fdlink() could return EXDEV as per rename() in that case. In any case, no-one's implemented it yet.
"The piece that's missing is an atomic link-and-unlink operation where you link a file into a directory with a given name, at the same time unlinking any file that was previously there with that name"
rename(2) already does this.
"(or even renaming the existing file atomically)."
You can do this with link(2), by e.g.
  link("config.cfg", "config.cfg.old");
  rename("config.cfg.new", "config.cfg");
      Posted Jul 17, 2013 10:12 UTC (Wed)
                               by epa (subscriber, #39769)
                              [Link] (2 responses)
       
I believe that  
However, it is atomic in the looser sense that the filename at any moment links to either the old file or the new one.  That may be good enough for many applications.
      
           
     
    
      Posted Jul 17, 2013 10:40 UTC (Wed)
                               by Karellen (subscriber, #67644)
                              [Link] 
       
Doh! That'll teach me to skim some messages. 
Wow. That's incredibly neat. Hadn't thought of/seen that before. Thanks. 
     
      Posted Jul 17, 2013 17:22 UTC (Wed)
                               by dlang (guest, #313)
                              [Link] 
       
as long as the target path always exists, and always points at either the old or the new, you should be in good shape. 
 
now, to be crash safe, you need to fsync the file before doing the rename, and you need to not be using ext3 which has such horrid fsync behavior. 
     
      Posted Jul 17, 2013 10:43 UTC (Wed)
                               by mjg59 (subscriber, #23239)
                              [Link] (8 responses)
       
     
    
      Posted Jul 17, 2013 23:35 UTC (Wed)
                               by Karellen (subscriber, #67644)
                              [Link] (2 responses)
       
I've seen that argument before, but it's always confused me. Surely that's only wanted as protection against an unexpected system crash/failure? Except - I didn't think that POSIX made any guarantees at all in that event. I thought your OS was "allowed" to overwrite your partition tables and FS journals completely in the event of a crash and still be POSIX-compliant. 
(If not, how does POSIX expect to guarantee otherwise, unless POSIX compliance requires the absence of certain classes of bugs?) 
Looking at the rationale section of POSIX fsync[0] documentation, fsync() is allowed to be the null operation, or to not cause data to actually be written, and that fsync() correctness could be considered a QoI issue. 
However, the Open Group website documentation the closest thing I have to the actual POSIX spec. If there is another section somewhere dealing with the general problem of compliance in the face of bugs/power outages which is more enlightening, I would welcome a link to it, or a quote from it. 
(FWIW, I think that Linux writing metadata before data is a poor QoI decision, and that the filesystem devs should strive to do otherwise, no matter what POSIX allows. However, IANA Kernel/FS developer, and am not properly informed on how hard, impractical or pessimal that might be.) 
[0] http://pubs.opengroup.org/onlinepubs/009695399/functions/... 
     
    
      Posted Jul 18, 2013 0:25 UTC (Thu)
                               by mjg59 (subscriber, #23239)
                              [Link] 
       
     
      Posted Jul 20, 2013 17:56 UTC (Sat)
                               by giraffedata (guest, #1954)
                              [Link] 
       
The thing about adherence to any standard is that one specifies the very adherence with myriad conditions, most of them implied.  So POSIX doesn't say, "if the system crashes, a read doesn't have to get back the same data that was written."  Rather, the system designer says, "the system is POSIX-compliant as long as the system never crashes."  And as I said, that condition is usually not actually spoken.  There are tons of similar conditions: the superuser does not write directly to the disk; the disk drive never makes a mistake; cosmic rays don't change magnetic state; etc.
 
Of course, designers do whatever they can to reduce the conditions; few systems today are offered on a "if the power ever goes out, nothing in the POSIX standard applies" basis.
 
Fsync drives us into the awkward territory of robustness.  Robustness is a system's ability to work when it is broken.  That contradiction in terms is why any specification of fsync is bound to be fuzzy.  It's like saying, "I will pay you back by Tuesday.  If I don't, ..."
      
           
     
      Posted Jul 18, 2013 14:06 UTC (Thu)
                               by Tobu (subscriber, #24111)
                              [Link] (4 responses)
       The unavailability of good (O_PONIES) semantics continues to amaze me.
 The only option right now seems to be a combination of f(data)sync and deferred threads; but introducing threads has a nasty engineering cost.
 The last I've seen of these issues (on the XFS list), maintainers were willing to take a new flag (don't know if that's possible; the VFS seems misdesigned to ignore new flags, see O_TMPFILES above) or a new VFS syscall that might be progressively implemented.
 
     
    
      Posted Jul 18, 2013 14:23 UTC (Thu)
                               by viro (subscriber, #7872)
                              [Link] (3 responses)
       
See the talk by Michael Kerrisk re ABI suckitude a while ago - this is a prime example of such ;-/ 
     
    
      Posted Jul 18, 2013 15:18 UTC (Thu)
                               by Cyberax (✭ supporter ✭, #52523)
                              [Link] (2 responses)
       
Just create a new syscall, say open2(), with a better-designed ABI. Old programs can still use open() and new ones can use the new syscall to get new features. 
     
    
      Posted Jul 18, 2013 19:58 UTC (Thu)
                               by paulj (subscriber, #341)
                              [Link] 
       
     
      Posted Jul 21, 2013 22:05 UTC (Sun)
                               by nix (subscriber, #2304)
                              [Link] 
       
No, what you do if you really think programs will care is introduce a new open2(), wire it to a new version of open() in glibc, change the values of all the O_* constants in glibc (but *not* the kernel) to some new value range that doesn't intersect the old, and have glibc redirect all calls using any old flag values to the old open() and all new ones to open(), mapping the 'new' flag values in the userspace API to the kernel values (probably by subtracting a constant). You can also expose the old flags under new names, OBS_EXCL and the like,. That way, old apps get the old syscall, new ones get the new syscall, and new apps that really, really want the old semantics can get them. 
If you thought it mattered that much, and really needed to do it, that's how you'd do it. No uglifying programs with horrible open2() nonsense. (Yes, you need a new glibc version to use this, but you need a new glibc to use any new syscall *anyway*.) 
 
     
    The 3.11 merge window closes
      Unfortunately, there currently is no way to create a new link to a file for which you only have a file handle.
I thought the linkat trick described by Al Viro in the grandparent comment would achieve that.
rename is not fully atomic.  As the manual page says, "However, when overwriting there will probably be a window in which both oldpath and newpath refer to the file being renamed.".  It's also not atomic over NFS (though enhancements to the NFS protocol may be out of scope for Linux kernel discussions).
The 3.11 merge window closes
      
The 3.11 merge window closes
      
The 3.11 merge window closes
      
The 3.11 merge window closes
      
The 3.11 merge window closes
      
The meaning of fsync
      
I thought your OS was "allowed" to overwrite your partition tables and FS journals completely in the event of a crash and still be POSIX-compliant.
The 3.11 merge window closes
      The 3.11 merge window closes
      
The 3.11 merge window closes
      
The 3.11 merge window closes
      
The 3.11 merge window closes
      
           