That is one way to look at it, but...

Posted Jan 22, 2015 10:03 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624)
In reply to: When real validation begins by JdGordy
Parent article: When real validation begins

There is a rather wide continuum of reasonable assumptions lying between your "assume your code is too buggy for anyone" and your "assume it is safe for everyone". One such reasonable assumption is that your code is safe enough for people who really badly need it, but not safe enough for people who do not need it quite so badly. This assumption might lead you to enable your code for people who really badly need it and disable it for the rest. Then, over time, as the people who really badly need the code find bugs in it and these bugs are fixed, it might (or might not) make sense to enable your code for additional classes of users, for example, those who have only moderate need of it.

It all depends on what fraction of the users need the new code, how much risk it poses to users not needing the new code, how aggressive your user base is, and how much effort has been put into validating the new code. In this case, a rather small fraction of the users needed the new code, there was moderate risk to other users, many of the other users were anything but aggressive, and my validation had (intentionally) not covered these other users' workloads. Not always an easy decision!

That is one way to look at it, but...

Posted Jan 22, 2015 14:22 UTC (Thu) by error27 (subscriber, #8346) [Link] (16 responses)

We really don't have a process to gradual roll outs. I think people barely test anything that's not in a distro kernel.

That is one way to look at it, but...

Posted Jan 22, 2015 15:05 UTC (Thu) by Limdi (guest, #100500) [Link] (8 responses)

How would a gradual rollout work?

# apt-kernel variant /1/
apt-kernel install <labels> <version>

vendor=debian.org
variant=stable,testing,pristine-upstream,barebone
name=linux

apt-kernel install variant=stable 3.19.1
apt-kernel install variant=testing 3.20-RC1
apt-kernel install vendor=debian.org,variant=pristine-upstream 3.20-RC2
apt-kernel install vendor=debian.org,variant=lightpatched 3.20-RC2
apt-kernel install vendor=debian.org,variant=barebone 3.20-RC2

# apt-kernel variant /2/
apt-kernel install <labels> <kernel-name>

vendor = debian.org(default) -> tells where to get it from
rc(release-candidate) = [1-10]
version = 3.19.1|latest
author = linus|linuxteam
variant=stable(default),testing,unstable,experimental

apt-kernel install vendor=debian.org,author=linuxteam,version=latest linux
apt-kernel install vendor=lkml.org,author=linus,version=latest linux

# apt-kernel-modules install <labels> <which-module>
One could expect the current kernel to be already installed.

version => for which kernel to install
rc => release-candidate
vendor=debian.org(default)

apt-kernel-modules install version=3.20,rc=4 overlayfs

## Install variant based on the existence of a feature.
In case the module with this feature rises from experimental to unstable/testing, these versions could be used.

apt-kernel-modules install features=multiple-ro overlayfs

Then one could choose any way dpkg-configure or via some config:
default-variant=experimental,unstable,testing,stable

And update the installed kernels/kernel modules maybe via
apt-kernel update
apt-kernel upgrade *

Just an idea written down. Something like that would maybe make it easier to use experimental kernels and kernel modules.

What do you think of it?

That is one way to look at it, but...

Posted Jan 22, 2015 19:20 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

I have to defer to others who understand packaging and installation much better than do I.

That is one way to look at it, but...

Posted Jan 23, 2015 8:03 UTC (Fri) by error27 (subscriber, #8346) [Link] (6 responses)

Kernel modules are already modular and you don't have to turn them on if you don't use them. The problem with the other options is that you have to recompile everything for that. So if you have A and B you have to compile four kernels. "disabled", "A", "B" and "A + B". If you have 3 boolean options you need 8 kernels, etc, it's 2^(number of options).

Your idea is basically to make it easier to get exactly the kernel they want. I think at that point people need to compile their own kernels. The problem is that "make menuconfig/xconfig" is total garbage so configuring your kernel is crazy difficult.

I can never find anything. Back in the day, my problem was that I couldn't figure out how to enable broadcom wireless drivers. I could see five drivers and I wasn't positive which one supported my drivers. And there were some overlap because we had the reverse engineered drivers and the ones that broadcom wrote later. In the end, the driver I wanted was "invisible" because I didn't have the BCMA bus enabled. In those days the BCMA wasn't shown under wireless it was in a completely different menu.

Last week I wanted to enable lustre to compile test a file. I couldn't find it because it was invisible. It depends on BROKEN. I still can't find how to enable BROKEN. (It might be deliberate so people are force to edit Kconfig files to enable broken stuff).

There is a search feature in menuconfig which tells you the location, but there isn't a "search in page" and quite a few of these lists of drivers are three pages long. If they were in alphabetical order that might help.

Very few people manually configure kernels. Kernel developers like me just generate their configs by using custom scripts. Menuconfig doesn't really have a maintainer but it's such an important tool.

Automation!!! It is the only way...

Posted Jan 23, 2015 9:57 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Indeed, all your points are valid reasons why most people don't bother building their own kernels.

And that is why I made the RCU callback offloading code automatically determine at boot time whether NO_HZ_FULL needs it or not. With that in place, most NO_HZ_FULL users simply don't need to worry about RCU callback offloading. It enables itself when they need it and only on the CPUs that they need it on, and stays out of the way otherwise.

Or at least that is the theory. We will soon see how it plays out in practice.

But regardless of whether or not I have additional bugs in this code (and Murphy of course says that I do), where reasonable we do need to try to automate configuration. And much else besides. ;-)

That is one way to look at it, but...

Posted Jan 23, 2015 17:37 UTC (Fri) by BenHutchings (subscriber, #37955) [Link] (4 responses)

There is a search feature in menuconfig which tells you the location, but there isn't a "search in page" and quite a few of these lists of drivers are three pages long. If they were in alphabetical order that might help.

nconfig is the new menuconfig; it has both global search for symbols and incremental search for labels within the current menu.

First I had heard of "make nconfig"

Posted Jan 23, 2015 21:59 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Very nice, thank you!

That is one way to look at it, but...

Posted Jan 29, 2015 14:44 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Doesn't seem terribly functional to me. F1 to F4 (allegedly Help, SymInfo, Help 2 and ShowAll) serve to go back or quit: F9 (allegedly Quit) does nothing.

I bet it's my terminal (old Konsole, TERM=xterm-color) or something. I guess I'll have to do some debugging...

That is one way to look at it, but...

Posted Jan 29, 2015 15:50 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

There is certainly room for improvement, as always. For example, search works nicely, but it would be even better to be able to directly change the values of the things found by the search, as can be done with xconfig. Still, nconfig is much faster than xconfig, which is welcome.

F9 works for me. I must confess that I didn't try most of the others.

That is one way to look at it, but...

Posted Jan 29, 2015 22:15 UTC (Thu) by vbabka (subscriber, #91706) [Link]

With make menuconfig, search result have prefixes like (1), (2) etc. Press the corresponding number key and it jumps directly to the option.

That is one way to look at it, but...

Posted Jan 22, 2015 19:19 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (6 responses)

To your point, I don't know of a general approach to gradually roll out new functionality. There have been several attempts in the past (such as CONFIG_EXPERIMENTAL), but these were often worked around. However, that doesn't mean that we cannot come up with specific approaches as needed for specific situations.

For example, in the case covered by this LWN article, the (eventual!) gradual rollout approach was to disable the relevant portions of the new functionality unless the user explicitly passed in a particular boot-time kernel parameter. Over time, we might be less restrictive about exposing new functionality, perhaps enabling it for additional use cases as they arise.

That is one way to look at it, but...

Posted Jan 22, 2015 22:44 UTC (Thu) by error27 (subscriber, #8346) [Link] (5 responses)

We could introduce a new config option to turn them on automatically.

config MY_OPTION
bool
depends on EXPERIMENTAL || VERSION > "3.21"

Right now the Kconfig parser only understands = and != so we'd have to update it to understand '>'. I'm brainstorming here so there are no bad ideas by the way, in case you were wondering. ;)

That is one way to look at it, but...

Posted Jan 23, 2015 0:31 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (4 responses)

That might work in some cases, but in the case in this LWN article, the determining factor wasn't the kernel version, but rather the type of workload. People with certain types of HPC or realtime workloads will specify the nohz_full= kernel boot parameter, so I can key off of that to enable RCU callback offloading.

There was a long email thread some years back on how to keep normal users from using new experimental functionality. Dave Jones suggested making the experimental feature splat on boot as the most reliable way to keep most distros from turning it on by default. ;-)

That is one way to look at it, but...

Posted Jan 29, 2015 14:56 UTC (Thu) by nix (subscriber, #2304) [Link] (3 responses)

No no. Make it detect when it's being run by a distro QA team, and *crash* on boot. Anything else will get overlooked. :)

That is one way to look at it, but...

Posted Jan 29, 2015 15:52 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

You do seem to have fully internalized the old saying "Murphy was an optimist"! ;-)

That is one way to look at it, but...

Posted Feb 4, 2015 15:06 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

I was being snarky. Excessively so: the QA people I work with (for Oracle Linux) are better than any other QA people I have ever worked with in any previous job. I can rely on them to find bugs! I can rely on them to proactively think of evil ways to break stuff to find more bugs! This is, to me, amazing: coworkers better at breaking my own stuff than I am, and I don't need to argue for weeks to convince them to do it, either.

(I'm sure all Linux vendors have similarly good QA teams -- I'm just only familiar with the one, and perhaps have had my expectations unduly lowered by awful QA teams in other jobs. I'm sure in e.g. aerospace the QA is even more effective.)

That is one way to look at it, but...

Posted Feb 4, 2015 15:43 UTC (Wed) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

"If it is not broken, fix your tests!!!" ;-)