Machine learning and stable kernels

Posted Sep 14, 2018 20:57 UTC (Fri) by osma (subscriber, #6912)
In reply to: Machine learning and stable kernels by sashal
Parent article: Machine learning and stable kernels

Thanks for explaining the details of the NN architecture.

I'm no ML expert, but have played around with some algorithms including neural networks. Based on the little understanding I've gathered along the way, this architecture seems more than a little overkill for the task. In particular the hidden layer is pretty huge. I assume that all input neurons are connected to every hidden layer neuron, as in a typical feed-forward network. Then you will be calculating more than 900 million weights! No wonder it took that long.

Are you sure you really need such a big hidden layer? In my understanding, the hidden layer size is typically somewhere around midway (in terms of magnitude, not necessarily absolute value) between the input layer and output layer sizes. The idea is that the hidden layer will try to generalize and find patterns in the input, for example identifying inputs that are correlated or whose relationship is important.

Have you tried a smaller hidden layer? I would try this with a hidden layer of size 1000 or so, perhaps even just 100. That could be easily tested on a laptop. You could even omit the hidden layer completely, which amounts to linear optimization and is not as powerful as a real neural network, but might still work fine in this case.