|
|
Subscribe / Log in / New account

Perfection is the enemy of the good?

Perfection is the enemy of the good?

Posted Jun 1, 2024 20:56 UTC (Sat) by roc (subscriber, #30627)
In reply to: Perfection is the enemy of the good? by Wol
Parent article: One more pidfdfs surprise

Design docs are great but they don't help you spot missed details like "didn't mean to set file type to zero".

The only real thing that helps here is more systematic tests. Write down all the invariants you can possibly think of, including "file type must be valid". Create a test harness that creates all possible types of files in all the different filesystems. Test the cross product before every kernel release, at least.

You can still forget to add a test for some invariant that later turns out to be important. Mutation testing and test coverage help with that. You can never completely solve this but certainly you can do a lot better than the kernel is currently doing. The good news is that this kind of testing work parallelizes well (although fixing the bugs you find doesn't).


to post comments

Perfection is the enemy of the good?

Posted Jun 1, 2024 23:01 UTC (Sat) by Wol (subscriber, #4433) [Link] (8 responses)

> Design docs are great but they don't help you spot missed details like "didn't mean to set file type to zero".

Good design docs do.

I bang on about truth tables, but if your design doc says "this is the set of possible filetypes, this is the set of valid filetypes, this is the set of invalid filetypes", it *should* stand out like a sore thumb if you fail to define file type 0. A good design doc should at least enumerate all logical possibilities - even if as I bang on it just says "I haven't covered this area". That way you know if your (re)definition is going to step on someone else's toes :-)

Cheers,
Wol

Perfection is the enemy of the good?

Posted Jun 2, 2024 6:54 UTC (Sun) by pm215 (subscriber, #98099) [Link] (6 responses)

But presumably in this case the design doc would say "0 is not a valid file type". Which might have helped userspace avoid assuming it was intentional, but doesn't help the kernel side -- the problem here AIUI wasn't that the design was unclear, it was a straightforward code bug where it didn't follow the intended design.

Perfection is the enemy of the good?

Posted Jun 2, 2024 7:39 UTC (Sun) by Wol (subscriber, #4433) [Link] (5 responses)

Yes (a) it would have told userspace it was breaking the rules, but ...

Why didn't the kernel follow the design? Because there was no design to follow? So it wasn't actually a bug because nobody thought things through?

The mere existence of a thought-through design massively reduces the chances of things like this being missed.

Not really the same (but it is) in lilypond they had a function that created a rehearsal mark - could be a letter or a number - and put it in a wrapper - could be none, circle or box. There was a random mix of possibilities, and I was forever tripping over stuff I couldn't do. So I proposed something as stupid as "why not do a cross-product of all available marks in all available wrappers?". Some kind soul did, and there's never been a problem since.

Sadly, it's been my *regular* experience that that sort of lack of thought is the *norm*. And it would pre-empt SOOOO many problems.

Cheers,
Wol

Perfection is the enemy of the good?

Posted Jun 2, 2024 17:59 UTC (Sun) by pm215 (subscriber, #98099) [Link] (4 responses)

I think the kernel didn't follow the design because this detail of it (which is probably mandated by posix) interacts unfortunately with the C language, which lets you forget to initialize a struct field and get zero, and won't insist that you fill it with a valid value for the type, because everything's just an integer.

Perfection is the enemy of the good?

Posted Jun 2, 2024 19:33 UTC (Sun) by Wol (subscriber, #4433) [Link] (3 responses)

> which lets you forget to

And a spec / design document should have provided a simple reminder ...

Cheers,
Wol

Perfection is the enemy of the good?

Posted Jun 2, 2024 21:28 UTC (Sun) by pm215 (subscriber, #98099) [Link] (2 responses)

But there was a spec -- it's posix, which says anything you stat should have a filetype and defines what they are. Having a spec that says "this field must be valid" manifestly didn't avoid the bug.

Perfection is the enemy of the good?

Posted Jun 2, 2024 23:00 UTC (Sun) by intelfx (subscriber, #130118) [Link]

> But there was a spec -- it's posix, which says anything you stat should have a filetype and defines what they are. Having a spec that says "this field must be valid" manifestly didn't avoid the bug.

I think it's in large part because the spec (POSIX) is so out of touch with the reality (what Linux actually implements) that POSIX compliance is a distant second thought at best (and is entirely ignored and left to the userspace, i.e. libc, at worst).

In either case, POSIX does not really seem to inform either development or validation of Linux syscalls. It is not a design document for Linux.

Perfection is the enemy of the good?

Posted Jun 3, 2024 10:26 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link]

SUS demands that st_mode shall have a meaningful value for defined file types which are block devices, character devices, FIFOs, regular files, symlinks and sockets. It doesn't define numerical values for such meaningful file type information, only macros.

Perfection is the enemy of the good?

Posted Jun 6, 2024 2:29 UTC (Thu) by samroberts (subscriber, #46749) [Link]

"design docs" have many permutations. Specifically, I would like to see every kernel API be rejected as a patch unless the patch includes *detailed* documentation, in whatever form the kernel is using for man pages now. Someone correct me if I'm wrong, but I think the manpages project is an after-the-fact catch up process.

If there were standard doc formats for file systems, so that `man pidfdfs` had to return a man page with information such as file type, whether it supports symlinks, what their string format is, etc, etc it would help a lot in reviewing code, and help in reminding authors of the full range of things they need to be considering.

As is, a lot of "code review" has only code patches and email threads. Reviewing code that doesn't have a description of what its supposed to do, and why, is a bit hard. Its also hard for potential consumers of the new APIs. If man pages are provided, many more people can look at the new API being proposed and give feedback before the API becomes and immutable ABI. As is, reviewers have to be familiar enough with kernel code to read the implementation, AND at the same time be a potential user. That's cutting lots of valuable possible review out.

As I recall early Linux, we didn't need docs so much, because much of early Linux wasn't novel. It was supposed to work like other unix-ish systems did, so when reviewing code, that's what we had to look for; does this work the same, does it work with already existing userspace BSD or GNU tools. As we have more and more system that are Linux specific, I don't think we've caught up to requiring docs for code submissions.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds