Extending restartable sequences with virtual CPU IDs

Posted Mar 1, 2022 13:24 UTC (Tue) by farnz (subscriber, #17727)
In reply to: Extending restartable sequences with virtual CPU IDs by taladar
Parent article: Extending restartable sequences with virtual CPU IDs

The length of the struct is a version number; each "API level" has a fixed size struct, where you only access elements that you're supposed to given the known struct size.

If user space passes in a struct whose size isn't one of the known values, whether it be very large and invalid, or just from a newer kernel version, the kernel rejects it, same as it would if the version number is wrong.

The extensions are ordered by presence in the kernel; if two extensions want extra data, then that's two separate struct fields, and the struct as a whole grows (it's a struct, not a union). The flags tell the kernel which fields are valid.

The advantage of size as opposed to version number is that C makes it easy to get right. I call the syscall with a pointer to the struct, and sizeof(user-space version of struct), and the compiler will assist me in getting it right (failing if I try to fill in fields I don't have, not letting me give a version number that's larger than the struct). All I have to do is ensure that the compiler can see the right definition for the struct, and I'm golden.

Extending restartable sequences with virtual CPU IDs

Posted Mar 1, 2022 15:38 UTC (Tue) by Paf (subscriber, #91811) [Link] (8 responses)

The - huge - disadvantage is that any version changes must be size modifying. What if there’s a bug or a desire to change the behavior of an existing field? Well, we can’t handle it with versioning unless we want to blow out the size.

Full stop, end of story. Sadness and clever workarounds ensue.

Extending restartable sequences with virtual CPU IDs

Posted Mar 1, 2022 19:27 UTC (Tue) by compudj (subscriber, #43335) [Link] (6 responses)

If an existing struct rseq field needs to change semantic/behavior, then it is not struct rseq anymore, and it would be named something else, and possibly registered through a new system call or with specific flags set when calling sys_rseq. The extensibility scheme for struct rseq is "append only" on purpose, so user-space applications can rely on having the exposed structure content unchanged in future kernels.

An explicit version number that would be expected to change the semantic of existing struct rseq fields whenever it is bumped would not be practical: an application supporting the current version number could not hope to support newer versions until it is recompiled, which is a no-go in terms of backward compatibility of kernel ABIs exposed to user-space.

Extending restartable sequences with virtual CPU IDs

Posted Mar 2, 2022 2:20 UTC (Wed) by Paf (subscriber, #91811) [Link] (5 responses)

Well, you’ve decided it’s not struct rseq anymore. That’s just something you’ve decided as a definitional line in the sand - it could just as easily be struct rseq v2, with the same layout but different semantics because you decided the earlier semantics were bad.

To the second part: well, yes - you’d have to carry support for multiple versions in the kernel. That’s all it means. Other projects do this all time.

The opposition to this is just a matter of preferring a new syscall with almost identical semantics or an extra field which changes semantics - which is what would happen if a major deficiency in the semantics were found - to an explicit versioning scheme. And that’s …. It’s a valid preference, though it’s definitely not mine.

I’m not asking you to fight this fight in the kernel, the choice has been made by others, but I do know which side I fall on.

Extending restartable sequences with virtual CPU IDs

Posted Mar 2, 2022 2:21 UTC (Wed) by Paf (subscriber, #91811) [Link]

By the way, this is (for my money) exactly the point of a version number - backwards compatibility by supporting multiple versions inside the API provider.

Extending restartable sequences with virtual CPU IDs

Posted Mar 2, 2022 11:37 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

Same layout but different semantics can be covered by a flags field (with the kernel rejecting requests where the flags are unknown); this means that the same fields can be interpreted differently by different kernel versions, depending on which flags you set.

Extending restartable sequences with virtual CPU IDs

Posted Mar 3, 2022 5:02 UTC (Thu) by Paf (subscriber, #91811) [Link]

Sure, you can do this with flags - sometimes you end up with a flag that basically says “new version”, but it can be done.

Extending restartable sequences with virtual CPU IDs

Posted Mar 2, 2022 15:59 UTC (Wed) by compudj (subscriber, #43335) [Link] (1 responses)

struct rseq is quite different from the usual system call input/output parameters.

struct rseq is meant to: have its fields populated/read by both the kernel and user-space, be allocated by a single "owner" library (e.g. glibc), and be used by the application executable as well as by various shared objects.

So it's not as simple as having the kernel support various versions, because all users of rseq within a process (main executable and shared libraries) need to agree on its size and feature set, because there is only a single struct rseq per thread.

Therefore, the solution proposed in the patch set expose the "feature size" supported by the kernel through auxiliary vectors, which allows glibc to allocate enough memory in the per-thread area, and register that to the kernel through sys_rseq. This way, all rseq users within the process can agree on the size of the supported rseq feature set by looking at both glibc's __rseq_size and the auxiliary vector rseq feature size.

If many struct rseq per thread were a possibility, things would be very much different and then version numbering would be possible, but it's been decided otherwise for the sake of keeping the kernel implementation simple and time-bounded.

So independently of the preference for version vs size-based extensibility, a version-based extensibility scheme for struct rseq simply won't work, because all user-space binaries linked into a process need to agree on the layout.

Extending restartable sequences with virtual CPU IDs

Posted Mar 3, 2022 5:01 UTC (Thu) by Paf (subscriber, #91811) [Link]

Ah, thank you for that clarification - that’s quite an extra ball of complexity. Interesting :o

Extending restartable sequences with virtual CPU IDs

Posted Mar 2, 2022 12:07 UTC (Wed) by smurf (subscriber, #17840) [Link]

You could just set a flag bit. Or add a flag field if there isn't one already.