in "Global Convergence of Block Coordinate Descent in Deep Learning" the authors claims that BCD is gradient-free method but in "Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks" authors calculate the gradient in the BCD algorithm.
Asked
Active
Viewed 34 times
0
-
Welcome to MSE. Please read this text about how to ask a good question. – José Carlos Santos Nov 09 '21 at 07:17
-
Yes, I have been wondering about this too. I guess its not exactly gradient-free coordinate descent in this case. – Saurabh7 Nov 26 '21 at 15:21