What about other filesystems?

Posted Jan 16, 2021 20:49 UTC (Sat) by NYKevin (subscriber, #129325)
In reply to: What about other filesystems? by matthias
Parent article: Fast commits for ext4

The fundamental problem with this argument is that the API you describe can be (and has been) implemented in userspace (in the form of SQLite, as well as numerous "real" databases). Therefore, if you want to argue in favor of doing this in kernel space, it is not enough to argue that a new API would be "better" in various ways. You need to *specifically* address one question: Why should anyone re-implement already working userspace code in the kernel? Would it provide some performance advantage? Would it somehow enable you to do things that you can't currently do? Or would it just be "more convenient?" If the latter, how is that the kernel's problem?

What about other filesystems?

Posted Jan 17, 2021 17:40 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

What I find hard to understand is, if the database (SQLite, whatever) is using linux syscalls, how does it know the data has actually been written? Or does it do loads of sync()s, and then pause all writes for ten seconds or so waiting for the data to flush, etc etc.

I can see how databases can provide 99.999% reliability. I'm active on the raid list. I know all about disk timeouts, disks lying, how long things take to get flushed, etc etc. I simply do not see how an application can guarantee safety.

As for "why should it be in the kernel" - because LOTS of developers will benefit from the ability to reason about the state of a system in a crash scenario. Why should all the database developers be forced to duplicate each others' work?

And frankly, if I commit something to the filesystem for saving, surely I should be able to ask the filesystem "have you saved it?" AND BE ABLE TO RELY ON THE ANSWER! (Yep, I know disks lie, and I don't expect the file system necessarily to deal with that, but it really should be held responsible for its own actions!)

Cheers,
Wol

What about other filesystems?

Posted Jan 17, 2021 22:13 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

The process that SQLite uses is documented in https://sqlite.org/atomiccommit.html in a very high level of detail.

TL;DR: They make a copy ("rollback journal") of the data they are about to overwrite, fsync that copy, overwrite the data, fsync the database itself, and finally delete the rollback journal.