Kernel quality control, or the lack thereof
Kernel quality control, or the lack thereof
Posted Dec 8, 2018 1:20 UTC (Sat) by vomlehn (guest, #45588)Parent article: Kernel quality control, or the lack thereof
Posted Dec 8, 2018 16:45 UTC (Sat)
by marcH (subscriber, #57642)
[Link] (9 responses)
Agreed 200%, this is the core issue:
> > We ended up here because we *trusted* that ...
Either tests already exist and it's just the matter of the extra mile to automate them and share their results.
Or there's no decent, repeatable and re-usable test coverage and new features should simply not be added until there is. "Thanks your patches looks great, now where are your tests results please?". Not exactly ground-breaking software engineering.
Exceptions could be tolerated for hardware-specific or pre-silicon drivers which require very specific test environments and for which vendors can only hurt themselves anyway. That clearly doesn't seem the case of XFS or the VFS.
Validation and automation have a lesser reputation than development and tend to attract less talent. One possible and extremely simple way to address this is to treat the *development* of tests and automation to the same open-source and code review standards.
Posted Dec 9, 2018 11:17 UTC (Sun)
by iabervon (subscriber, #722)
[Link] (7 responses)
Posted Dec 9, 2018 14:20 UTC (Sun)
by saffroy (guest, #43999)
[Link] (5 responses)
Besides tests themselves, it helps a LOT to have some kind of test coverage report, just to remind you of which parts of the code are never touched by any of your current tests.
Do people publish such coverage reports for the kernel?
Posted Dec 10, 2018 9:49 UTC (Mon)
by metan (subscriber, #74107)
[Link] (4 responses)
However I can pretty much say that the main problems I see are various corner cases that are rarely hit (i.e. mostly failures and error propagation) and drivers. My take on this is that there is no point in doing coverage analysis when the gaps we have are enormous and easy to spot. Just have a look at our backlog of missing coverage in LTP at the moment https://github.com/linux-test-project/ltp/labels/missing%..., and these are just scratching the surface with most obviously missing syscalls. We may try to proceed with the coverage analysis once we are out of work there, which will hopefully happen at some point.
The problems with corner cases can be likely caught by combination of unit testing and fuzzing. Drivers testing is more problematic though, there is only so much you can do with qemu and emulated hardware. Proper driver testing needs a reasonably sized lab stacked with hardware and it's much more problematic to set up and maintain which is not going to happen unless somebody invests reasonable amount of resources into it. But there is light at the end of the tunnel as well, as far as I know Linaro has a big automated lab stacked with embedded hardware to run tests on, we are trying to tackle automated server grade hardware lab here in SUSE, and I'm pretty sure there is a lot more outside there just not that visible to the general public.
Posted Dec 10, 2018 12:57 UTC (Mon)
by nix (subscriber, #2304)
[Link] (3 responses)
There is no alternative to thinking about these problems, I'm afraid. There is no magic automatable road to well-tested software of this complexity.
Posted Dec 10, 2018 13:14 UTC (Mon)
by metan (subscriber, #74107)
[Link] (2 responses)
Posted Dec 11, 2018 17:37 UTC (Tue)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Dec 11, 2018 20:59 UTC (Tue)
by marcH (subscriber, #57642)
[Link]
Posted Dec 9, 2018 17:28 UTC (Sun)
by marcH (subscriber, #57642)
[Link]
Thinking of it computer security is a bit like... healthcare: extremely opaque and nearly impossible for customers to make educated choices about it. From a legal perspective I suspect it's even worse, breach after breach and absolutely zero liability. To top it up class actions are no longer, killed by arbitration clauses in all Terms and Conditions. Brands might be more useful in security though.
https://www.google.com/search?q=site%3Aschneier.com+liabi...
Posted Dec 9, 2018 13:32 UTC (Sun)
by mupuf (subscriber, #86890)
[Link]
This is what we do in the i915 community. No feature lands in DRM without a test in IGT, and CI developers are part of the same team.
My view on this is that good quality comes from:
Point 1) is pretty much well done in the Linux community.
Point 2) is hard to justify when tests are not executed, but comes more naturally when we have a good CI system
Point 3) is probably the biggest issue for the Linux CI systems: The driver usually covers a wide variety of HW and configuration which cannot all be tested in CI at all time. This leads to complexity in the CI system that needs to be understood by developers in order to prevent regressions. This is why our CI is maintained and developed in the same team developing the driver.
Point 4) is coming pretty naturally when introducing a filtering system for CI failures. Some failures are known and pending fixing, and we do not want these to be considered as blocking for a patch series. We have been using bugs to create a forum of discussion for developers to discuss how to fix these issues. These bugs are associated to CI failures by a tool doing pattern matching (https://intel-gfx-ci.01.org/cibuglog/). The problem is that these bugs are now every developer's responsibility to fix, and that requires a change in the development culture to hold up some new features until some more important bugs are fixed.
I guess we are getting quite good at CI, and I am really looking forward to us in the CI team to have more time to share our knowledge and tools for others to replicate! We have already started working on an open source toolbox for CI (https://gitlab.freedesktop.org/gfx-ci), as discussed at XDC 2018 (https://xdc2018.x.org/slides/GFX_Testing_Workshop.pdf).
Posted Dec 10, 2018 20:35 UTC (Mon)
by sandeen (guest, #42852)
[Link]
You may wish to subscribe to fstests@vger.kernel.org or peruse git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git if this sort of thing is of interest to you.
Kernel quality control, or the lack thereof
Kernel quality control, or the lack thereof
Kernel quality control, or the lack thereof
Kernel quality control, or the lack thereof
Kernel quality control, or the lack thereof
Kernel quality control, or the lack thereof
Kernel quality control, or the lack thereof
Kernel quality control, or the lack thereof
Kernel quality control, or the lack thereof
Kernel quality control, or the lack thereof
1) Well written driver code, peer reviewed to catch architectural issues
2) Good tests exercising the main use case, and corner cases. Tests are considered at the same level as driver code.
3) Good understand of the CI system that will execute these tests
4) Good following of the bugs filed when these tests fail
Kernel quality control, or the lack thereof