0

I am trying to learn gradient descent for machine learning. In this highly cited research paper https://arxiv.org/pdf/1609.04747.pdf, the author presents the gradient descent as

$$\theta = \theta - \eta \nabla_\theta J(\theta)$$

I have never seen this expression before. Is this some analytical formula for calculating the variables $\theta$? Wouldn't the $\theta$ be cancelled out? I am confused, please help.

  • 1
    The $\theta$ is actually reiterative. So it should be $\theta_{k+1} = \theta_k - \eta\nabla_\theta J(\theta)$. And perhaps the $\eta$ should be subscripted too. Thats why the thetas dont cancel out. – CogitoErgoCogitoSum Apr 15 '18 at 00:34
  • Now Im not entirely sure where this particular expression came from. I do know the gradient descent method, though. Ive studied it in a convex optimization course. It involves a step size, and Im presuming that is what $\eta$ is, and it involves a step direction, which Im presuming is what $\nabla_\theta J(\theta)$ is, though Im unfamiliar with the notation. Does $J$ refer to the Jacobian or some other matrix? There are slight modifications to the gradient descent method that involves another matrix to improve efficiency. – CogitoErgoCogitoSum Apr 15 '18 at 00:37

1 Answers1

0

As @CogitoErgoCogitoSum mentioned in the comments, the iteration should be written as $$ \theta^{k+1} = \theta^k - \eta \nabla J(\theta^k). $$ Starting at the point $\theta^k$, we take a step in the direction of steepest descent (that is, the negative gradient direction), which moves us to a new point $\theta^{k+1}$ where the value of $J$ has been reduced.

littleO
  • 51,938