LWN: Comments on "Machine learning and stable kernels"

Machine learning and stable kernels

osma — Fri, 14 Sep 2018 20:57:13 +0000

Thanks for explaining the details of the NN architecture.

I'm no ML expert, but have played around with some algorithms including neural networks. Based on the little understanding I've gathered along the way, this architecture seems more than a little overkill for the task. In particular the hidden layer is pretty huge. I assume that all input neurons are connected to every hidden layer neuron, as in a typical feed-forward network. Then you will be calculating more than 900 million weights! No wonder it took that long.

Are you sure you really need such a big hidden layer? In my understanding, the hidden layer size is typically somewhere around midway (in terms of magnitude, not necessarily absolute value) between the input layer and output layer sizes. The idea is that the hidden layer will try to generalize and find patterns in the input, for example identifying inputs that are correlated or whose relationship is important.

Have you tried a smaller hidden layer? I would try this with a hidden layer of size 1000 or so, perhaps even just 100. That could be easily tested on a laptop. You could even omit the hidden layer completely, which amounts to linear optimization and is not as powerful as a real neural network, but might still work fine in this case.

Regressions

sashal — Fri, 14 Sep 2018 06:23:38 +0000

Indeed, it would be useful to catch buggy patches early on.

See https://lwn.net/ml/linux-kernel/20180501163818.GD1468@sas... and the follow up thread https://lists.linuxfoundation.org/pipermail/ksummit-discu... .

Regressions

abatters — Thu, 13 Sep 2018 21:58:16 +0000

Thanks for taking the time to look at my example. In this case you are probably correct, a very intelligent but overworked human made a mistake; happens to us all (and I certainly do appreciate your work and Greg KH's too). It is certainly true that your automatic process will backport many fixes that humans would have missed. But it is also true that backporting more patches will cause more regressions. This is true whether the backporting is automatic or not. It appears that you are already familiar with this risk: https://lwn.net/Articles/692866/

So here's an idea for your next project. Track all the regressions introduced by -stable patches and see if you can use machine learning to prevent future regressions. Now *that* would be awesome.

Regressions

sashal — Thu, 13 Sep 2018 17:21:19 +0000

I'm slightly confused by your conclusion. From what I can tell, the commit you pointed out was backported by a human (Greg KH) without any input from this automatic process.

How did you reach the conclusion that automatic process is bad from the above?

Machine learning and stable kernels

sashal — Thu, 13 Sep 2018 17:19:23 +0000

It's about 30k inputs, 30k nodes in the hidden layer and 1 output (stable/not stable). Training set is about 260k different commits.

The hardware is 6 core 12GiB RAM VM with one Nvidia V100 GPU.

Regressions

abatters — Thu, 13 Sep 2018 16:08:25 +0000

I have been bitten multiple times by "fixes" being backported to -stable that actually caused regressions instead. Here is one example:

https://marc.info/?l=linux-scsi&m=152227354106242

In this case, a bad "cleanup" patch introduced numerous bugs (including CVE-2017-14991) into mainline, and was followed by several different patches to fix the bugs it introduced. I guess someone wanted to backport the CVE fix to -stable (even though -stable wasn't affected), but since the CVE fix depended on the bad "cleanup" patch that introduced the problem, the bad "cleanup" patch was backported also, along with the additional bugs that it brought with it. But one of the additional bugs went unfixed in -stable for several months until I pointed it out. Of course the bad "cleanup" patch is what introduced the CVE to begin with, so it would have been better if none of the patches had been backported. But figuring that out requires a person to actually read the code, understand it, and make a judgment about it, whereas we seem to be headed in the opposite direction.

Spamassassin?

alanjwylie — Thu, 13 Sep 2018 11:02:45 +0000

Patches come as e-mails. Spamassassin takes e-mails and scores them. I'm *so* tempted to see what would happen if I trained Spamassassin, especially its Bayesian component, in this way.

Machine learning and stable kernels

pmarini — Thu, 13 Sep 2018 06:40:33 +0000

This is a very interesting use case for neural networks. It would be interesting to know the size of the training dataset (number of rows and number of features), the network architecture (number of hidden layers, number of units in each layer, ..), the hardware resources and the GPU configuration. The reason I'm asking is the very long training time, 1 month..
Thanks!