We can use weight decay method for a condition stopping to avoid overfitting when we train a neural network. This method applied with gradient descent learning, bayesian learning, but i want apply it combine with scale conjugate gradient. But i don't know it can enable? And what it effects to update weight?
Asked
Active
Viewed 74 times
1
-
if the function to be optimized is (locally) convex enough, then you can use the gradient only as a hint of the direction to which apply the update. for example you can keep only the signs of each gradient's coordinates, or use the conjugate gradient hoping that it will converge faster, or find the optimal $\eta$ at each step, etc. does it mean something to you ? – reuns May 04 '16 at 01:26