|
|
Subscribe / Log in / New account

Rethinking splice()

Rethinking splice()

Posted Feb 27, 2023 3:14 UTC (Mon) by ringerc (subscriber, #3071)
In reply to: Rethinking splice() by joib
Parent article: Rethinking splice()

Not only that, but a flags argument in which the low bits are "ignore if flag unrecognised" and the high bits are "syscall should fail if the flag bits are unrecognised".

Given the number of times we've had issues with adding flags that change important semantics, where the syscall has no way to say "IDK what you setting flag bit 7 means, hope it isn't important".


to post comments

Rethinking splice()

Posted Feb 27, 2023 11:53 UTC (Mon) by paulj (subscriber, #341) [Link] (12 responses)

No, just use 2 separate flags arguments. One for "optional, proceed if unrecognised" and the other for "mandatory, fail if unrecognised".

Rethinking splice()

Posted Feb 27, 2023 12:29 UTC (Mon) by johill (subscriber, #25196) [Link] (10 responses)

This basically doesn't work. If any of your userspace is something like

unsigned int flags1, flags2 = 0;
syscall(..., &flags1, &flags2);

then you've basically painted yourself into the corner of not being able to use flags1 for anything of interest in the future?

Rethinking splice()

Posted Feb 27, 2023 12:37 UTC (Mon) by paulj (subscriber, #341) [Link] (9 responses)

mandatory fail if unrecognised - for a set bit, obviously.

Rethinking splice()

Posted Feb 27, 2023 14:21 UTC (Mon) by johill (subscriber, #25196) [Link] (8 responses)

I meant flags1 for the "optional, proceed if unrecognised" part. Don't see how you can really do "optional, proceed if unrecognised" at all since applications might just erroneously set random bits in there (as in the example), was just trying to illustrate why not.

Rethinking splice()

Posted Feb 27, 2023 16:22 UTC (Mon) by paulj (subscriber, #341) [Link] (7 responses)

If apps are specifying flag arguments with undefined values, well... they're going to get undefined behaviour (sooner or later) - tough for them I'd say.

That's no different from today, where we have syscalls with flags that are not yet defined and their value (as yet) unchecked; or flags that are defined for future use but not yet implemented (and not checked), is it?

GIGO.

Rethinking splice()

Posted Feb 27, 2023 16:38 UTC (Mon) by farnz (subscriber, #17727) [Link] (6 responses)

That conflicts with "don't break userspace". I built a perfectly working binary on Linux 6.4, which sets flags to 0x100 - a bad value, since the only currently defined value is 0. I upgrade to Linux 7.1, and it still works, since the defined flags values don't yet interpret 0x100. When I upgrade from 7.1 to 7.2, flags 0x100 is given a meaning, and my binary breaks. As far as Linus is concerned, that's a kernel regression, and you need to revert the feature that makes sense of flags value 0x100, and find a value that userspace doesn't set.

Ultimately, this forces kernel developers to check all parameter values are set to something meaningful, and to error if any of the values are either not valid, or valid but not understood by this kernel. That way, my binary fails on Linux 6.4 as well as on 7.1, and stops failing on 7.2 - and Linus agrees that "binary used to fail with EINVAL, now works" is not a regression.

Rethinking splice()

Posted Feb 28, 2023 11:08 UTC (Tue) by paulj (subscriber, #341) [Link] (4 responses)

Fair enough, that works too.

In networking, it is common to allow for values that are optional, and not per se understood by the recipient - who may just ignore them. And values that are mandatory to understand, so the recipient must give some error if not recognised.

Optional values allow a protocol to be extended with optional and wholly backwards compatible features, so that newer speakers with the feature happily co-exist with speakers without it. While 2 "newer" speakers presumably derive some benefit from both supporting the feature. I guess it's more rare in software (Linux especially) to have an application compiled with some such feature, and run it on some older kernel/library-stack that lacks it.

Another way to achieve the latter - rather than explicitly having 2 classes of flags - is to specify that unused fields "Must Be Zero". If such a bit is set it's an error. If such a bit is then repurposed in an update, and it used by a new speaker with an old speaker, then the old speaker raises an error - the new speaker can try again without. 2 new speakers speaking to each other just happily use the new meaning of the formerly "Must Be Zero" flag.

What you're saying is Linux kernel userspace-API flags must always be of the "MBZ" kind - there is no need for optional. Fair enough. :)

Rethinking splice()

Posted Feb 28, 2023 17:29 UTC (Tue) by farnz (subscriber, #17727) [Link] (3 responses)

The tradeoff is different between networking and the kernel, too A program runs for milliseconds through to months on the same kernel, and the RTT to the kernel is on the order of 1 microsecond. Doing 10 RTTs to determine what features are supported and choosing fallbacks isn't significant time compared to the runtime of the program - especially since programs that use new features and have fallbacks for older kernels are likely to be long-running programs.

In contrast, in the networking world, RTTs are higher (milliseconds, not microseconds, most of the time), and connection lifetime is shorter on the high end (connections for more than a day are unusual). Doing 10 RTTs to determine the feature set of the other end, when you'll only have the same remote for a few hours at most, and more likely for seconds at a time.

Rethinking splice()

Posted Mar 1, 2023 11:03 UTC (Wed) by paulj (subscriber, #341) [Link] (2 responses)

They're still protocols for different entities to communicate and achieve something, end of the day. ;)

The fallback thing, the problem is the entity asking for the optional enhancement, that could otherwise be ignored, often will not implement the fallback path. So with a hard fail, the entire thing may fail. You need more logic to make the "nice to have, but optional and can be ignored" thing work reliably. The test matrix gets bigger (and bigger and bigger, with each such option).

Just having it silently ignored if communicated to an entity that doesn't know it is simpler, and can not have fallback path bugs.

Trade-offs in all directions. ;)

Rethinking splice()

Posted Mar 1, 2023 11:22 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

And to add another layer to the tradeoffs (one that's changed over time, to boot), in today's world it's often easier to not bother with the new feature at all until you can guarantee that all the hosts your application runs on have the new kernel feature, whereas it's often hard to get all the remote endpoints of a service you depend upon upgraded to new networking features.

This will change again, but for now, that's where we sit.

Rethinking splice()

Posted Mar 1, 2023 17:02 UTC (Wed) by paulj (subscriber, #341) [Link]

yeah, software often doesn't care about this kind of compatibility. Except when it comes to systems software and features critical for booting. Then you need to think about forward and backward compatibility - least in the Linux world.

Rethinking splice()

Posted Mar 1, 2023 22:12 UTC (Wed) by nix (subscriber, #2304) [Link]

Rethinking splice()

Posted Feb 27, 2023 15:38 UTC (Mon) by farnz (subscriber, #17727) [Link]

You basically cannot do "optional, proceed if unrecognised" sanely. If you do, you have no way of distinguishing "app passed a non-zero flags value because it never went wrong testing on older kernels" from "app passed a non-zero flags value because it knows about the new meaning of this value".

What you do need is a very clear way for the app to ask what flags values are known about - so that the app can test all the combinations it wants to use at start-up, fail early if the kernel doesn't support anything appropriate, and choose fallbacks if the kernel support is sub-optimal (e.g. older kernel).


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds