Adding an in-kernel TLS handshake

Posted Jun 2, 2022 18:50 UTC (Thu) by jlayton (subscriber, #31672)
In reply to: Adding an in-kernel TLS handshake by james
Parent article: Adding an in-kernel TLS handshake

Yes, that's all true. Also, the kernel just overall better at avoiding these situations these days. It's more proactive about flushing and blocking new pages from being dirtied when things aren't being cleaned.

I agree that a userland implementation is definitely the way to go. We may need the daemon to be extra careful to avoid allocations in critical codepaths, which may be difficult depending on what the TLS libraries do under the hood.

Adding an in-kernel TLS handshake

Posted Jun 3, 2022 0:54 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

Strictly speaking, can't the kernel mark the writes as bad even after it has accepted them, and return EIO on close/fsync? That's probably not very *nice*, but if the writes physically cannot be persisted anyway, you may as well let the application know that its data got lost.

But OTOH neither the man pages nor POSIX are very clear about what EIO even means or how userspace should react to it, so I imagine there are some applications that will freak out and do weird things if you return that error. Amazingly, POSIX does not even tell you what the state of the file descriptor is after close(2) fails with EIO, which means you have no way of knowing (assuming a POSIX-only environment that lacks /proc/self/fd) whether the file descriptor still exists and still needs to be closed! I guess the only safe way is to loop and repeatedly call close until you get EBADF? But that's obviously not thread-safe, and I could imagine a brain-dead implementation that just keeps returning EIO and never deallocates the fd.

Adding an in-kernel TLS handshake

Posted Jun 3, 2022 14:27 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

> Strictly speaking, can't the kernel mark the writes as bad even after it has accepted them, and return EIO on close/fsync? That's probably not very *nice*, but if the writes physically cannot be persisted anyway, you may as well let the application know that its data got lost.

Have you *looked* at what happens to data once the write() call returns? The reality is that the kernel doesn't have a clue which application needs to be told, nor how to tell it.

It gets even worse once network/raid/luks/integrity/blahblah gets involved. As a simple example, let's say you're writing a file of one block to a ten-disk raid array. You need to read 40k from disk, recompute checksums, and write the whole lot back. If THAT goes wrong, how do you tell the application it just trashed some data that was written six months ago ... ?

Okay, that's a bit extreme, but once the application has launched the data on its journey to disk, it's very hard to work out some sane way to pass an error back up the unpredictable path the data has taken.

Cheers,
Wol

Adding an in-kernel TLS handshake

Posted Jun 3, 2022 15:50 UTC (Fri) by jlayton (subscriber, #31672) [Link]

> Have you *looked* at what happens to data once the write() call returns? The reality is that the kernel doesn't have a clue which application needs to be told, nor how to tell it.

Not true, at least not on modern kernels. We track writeback errors in a better way now such that if we get one, it's reported exactly once to fsync/msync on every fd that was open at the time that the error was recorded. Ditto for syncfs(2).

Adding an in-kernel TLS handshake

Posted Jun 3, 2022 15:45 UTC (Fri) by jlayton (subscriber, #31672) [Link]

Writeback errors are an option, but not a good one. Most applications can't handle them gracefully, so this usually means that the program dies or something equally awful...and in this case, the problem _should_ be temporary. We really don't want to return a writeback errors on fsync unless there really is no other option. As far as close(2) goes, we really ought not return writeback errors to it at all. The only "legitimate" error for close(2) is EBADF.