LWN: Comments on "Machine learning and stable kernels" https://lwn.net/Articles/764647/ This is a special feed containing comments posted to the individual LWN article titled "Machine learning and stable kernels". en-us Sat, 18 Oct 2025 05:24:06 +0000 Sat, 18 Oct 2025 05:24:06 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Machine learning and stable kernels https://lwn.net/Articles/764850/ https://lwn.net/Articles/764850/ osma <div class="FormattedComment"> Thanks for explaining the details of the NN architecture.<br> <p> I'm no ML expert, but have played around with some algorithms including neural networks. Based on the little understanding I've gathered along the way, this architecture seems more than a little overkill for the task. In particular the hidden layer is pretty huge. I assume that all input neurons are connected to every hidden layer neuron, as in a typical feed-forward network. Then you will be calculating more than 900 million weights! No wonder it took that long. <br> <p> Are you sure you really need such a big hidden layer? In my understanding, the hidden layer size is typically somewhere around midway (in terms of magnitude, not necessarily absolute value) between the input layer and output layer sizes. The idea is that the hidden layer will try to generalize and find patterns in the input, for example identifying inputs that are correlated or whose relationship is important.<br> <p> Have you tried a smaller hidden layer? I would try this with a hidden layer of size 1000 or so, perhaps even just 100. That could be easily tested on a laptop. You could even omit the hidden layer completely, which amounts to linear optimization and is not as powerful as a real neural network, but might still work fine in this case.<br> </div> Fri, 14 Sep 2018 20:57:13 +0000 Regressions https://lwn.net/Articles/764789/ https://lwn.net/Articles/764789/ sashal <div class="FormattedComment"> Indeed, it would be useful to catch buggy patches early on.<br> <p> See <a href="https://lwn.net/ml/linux-kernel/20180501163818.GD1468@sasha-vm/">https://lwn.net/ml/linux-kernel/20180501163818.GD1468@sas...</a> and the follow up thread <a href="https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2018-September/005160.html">https://lists.linuxfoundation.org/pipermail/ksummit-discu...</a> .<br> </div> Fri, 14 Sep 2018 06:23:38 +0000 Regressions https://lwn.net/Articles/764764/ https://lwn.net/Articles/764764/ abatters <div class="FormattedComment"> Thanks for taking the time to look at my example. In this case you are probably correct, a very intelligent but overworked human made a mistake; happens to us all (and I certainly do appreciate your work and Greg KH's too). It is certainly true that your automatic process will backport many fixes that humans would have missed. But it is also true that backporting more patches will cause more regressions. This is true whether the backporting is automatic or not. It appears that you are already familiar with this risk: <a href="https://lwn.net/Articles/692866/">https://lwn.net/Articles/692866/</a><br> <p> So here's an idea for your next project. Track all the regressions introduced by -stable patches and see if you can use machine learning to prevent future regressions. Now *that* would be awesome.<br> </div> Thu, 13 Sep 2018 21:58:16 +0000 Regressions https://lwn.net/Articles/764731/ https://lwn.net/Articles/764731/ sashal <div class="FormattedComment"> I'm slightly confused by your conclusion. From what I can tell, the commit you pointed out was backported by a human (Greg KH) without any input from this automatic process.<br> <p> How did you reach the conclusion that automatic process is bad from the above?<br> </div> Thu, 13 Sep 2018 17:21:19 +0000 Machine learning and stable kernels https://lwn.net/Articles/764730/ https://lwn.net/Articles/764730/ sashal <div class="FormattedComment"> It's about 30k inputs, 30k nodes in the hidden layer and 1 output (stable/not stable). Training set is about 260k different commits.<br> <p> The hardware is 6 core 12GiB RAM VM with one Nvidia V100 GPU.<br> </div> Thu, 13 Sep 2018 17:19:23 +0000 Regressions https://lwn.net/Articles/764723/ https://lwn.net/Articles/764723/ abatters <div class="FormattedComment"> I have been bitten multiple times by "fixes" being backported to -stable that actually caused regressions instead. Here is one example:<br> <p> <a href="https://marc.info/?l=linux-scsi&amp;m=152227354106242">https://marc.info/?l=linux-scsi&amp;m=152227354106242</a><br> <p> In this case, a bad "cleanup" patch introduced numerous bugs (including CVE-2017-14991) into mainline, and was followed by several different patches to fix the bugs it introduced. I guess someone wanted to backport the CVE fix to -stable (even though -stable wasn't affected), but since the CVE fix depended on the bad "cleanup" patch that introduced the problem, the bad "cleanup" patch was backported also, along with the additional bugs that it brought with it. But one of the additional bugs went unfixed in -stable for several months until I pointed it out. Of course the bad "cleanup" patch is what introduced the CVE to begin with, so it would have been better if none of the patches had been backported. But figuring that out requires a person to actually read the code, understand it, and make a judgment about it, whereas we seem to be headed in the opposite direction.<br> </div> Thu, 13 Sep 2018 16:08:25 +0000 Spamassassin? https://lwn.net/Articles/764684/ https://lwn.net/Articles/764684/ alanjwylie <div class="FormattedComment"> Patches come as e-mails. Spamassassin takes e-mails and scores them. I'm *so* tempted to see what would happen if I trained Spamassassin, especially its Bayesian component, in this way. <br> </div> Thu, 13 Sep 2018 11:02:45 +0000 Machine learning and stable kernels https://lwn.net/Articles/764679/ https://lwn.net/Articles/764679/ pmarini <div class="FormattedComment"> This is a very interesting use case for neural networks. It would be interesting to know the size of the training dataset (number of rows and number of features), the network architecture (number of hidden layers, number of units in each layer, ..), the hardware resources and the GPU configuration. The reason I'm asking is the very long training time, 1 month..<br> Thanks!<br> </div> Thu, 13 Sep 2018 06:40:33 +0000