The copy problem is really the backup problem
The copy problem is really the backup problem
Posted May 30, 2019 15:01 UTC (Thu) by mcr (subscriber, #99374)Parent article: The Linux "copy problem"
In the old says of Unix, with a single file system, we used "dump" to get a good copy. It had all sorts ridiculous issues of having a userspace program trying to decipher file system contents from raw reads of the disk. On the other hand, when it worked, it got all the metadata, did it without destroying the buffer cache, and often was able to backup disks which were in the process of dying. When it failed, it failed, and the backups were sometimes garbage. And it didn't work for many things. So people mostly use tar for backup. And that's should be the most common copy problem, which is not just about data centers or cluster environments. And tar fails for any file system that does something innovative.
My claim is that our VFS layer is incomplete: it should include an atomic backup and an atomic restore operation, at least on a file level, but optionally on a directory basis. If we had that, then cp would always usefully be backup file | restore file2. This means that file systems have to serialize file contents and meta data, and have to deserialize it too. We Linux a microkernel architecture, then probably much of this deserialization could be done in some system-provided, non-ring0 context. Should we pick tar for serialization, or something more modern like CBOR, that's a bike shed for a design team.
I would just be happy if we could agree that we need this functionality.
My claim is that our VFS layer is incomplete: it should include an atomic backup and an atomic restore operation, at least on a file level, but optionally on a directory basis. If we had that, then cp would always usefully be backup file | restore file2. This means that file systems have to serialize file contents and meta data, and have to deserialize it too. We Linux a microkernel architecture, then probably much of this deserialization could be done in some system-provided, non-ring0 context. Should we pick tar for serialization, or something more modern like CBOR, that's a bike shed for a design team.
I would just be happy if we could agree that we need this functionality.
