Questions tagged [gradient-descent]

"Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point."

Gradient descent is based on the observation that if the multi-variable function $F(x)$ is defined and differentiable in a neighborhood of a point $a$ , then $F(x)$ decreases fastest if one goes from $a$ in the direction of the negative gradient of $F$ at $a$, $-\nabla F(a )$. It follows that, if

$$a_{n+1}=a_n-\gamma \nabla F(a_n)$$

for positive $\gamma$ that is small enough, then $F(a_n) \ge F(a_{n+1})$.

1029 questions
1
vote
1 answer

Backtracking Line Search - Graphical Interpretation

I am new to convex optimization and got a little bit confused while reading up on the backtracking line search. What is $f(x) + \alpha t \nabla f(x)^T\Delta x$ ? I know of the Taylor first order expansion but this does not look like it.
Kong
  • 884
1
vote
0 answers

Improving the convergence of gradient descent method

I am looking for methods to improve the gradient descent algorithm. In particular, my problem is that i'm trying to minimise the sum of squared errors between some observed data and the data produced by my model (based on non-linear differential…
Hikaru
  • 941
1
vote
1 answer

why Clipped gradient can get stuck in a flat spot

I read the paper Towards Evaluating the Robustness of Neural Networks by Carlini there is a phrase: the algorithm Clipped gradient can get stuck in a flat spot where it has increased some component x_i to be substantially larger than the maximum…
Ivan Kush
  • 111
0
votes
1 answer

Gradient of a loss function

Considering the following Loss function: $$A{L_t}\left( {{\mathbf{w}_t}} \right) = \sum\nolimits_{j = 1}^k {{L_t}\left( {{b^j}} \right)w_t^j}$$ I want to calculate the gradient of it ($\nabla A{L_t}\left( {{\mathbf{w}_t}} \right)$). The variables…
Shayan
  • 113
0
votes
0 answers

Weight Vector calculation in CW-OGD strategy

I'm trying to understand an Online Portfolio Selection (OPS) strategy proposed by Zhang et al. (2021), named combination weights based on online gradient descent (CW-OGD). The algorithm is as follows: My question is about the 7th line of the…
Shayan
  • 113
0
votes
2 answers

How to caculate the gradient of $f(x)=\frac{1}{2}x^TMx+b^Tx$?

A linear complementarity problem(LCP): $y=Mx+b$ $x_iy_i=0$ $x_i ≥0$, $y_i≥0$ $M=\left[ \begin{array}{ccc}0 & -A^T\\A & 0\end{array}\right]$ $x \in R^n$, $y\in R^n$ Problem: $f(x)=\frac{1}{2}x^TMx+b^Tx$, please calculate the gradient $\nabla…
0
votes
2 answers

How can I imagine / visualize gradient descent with many variables?

I can imagine something like above. But it works only when solving linear regression using gradient descent. How can I begin imagining when there are 4, 5, 6, ... 10000 variables? What does it even look like? Sorry, just a total beginner.
hey
  • 209
0
votes
0 answers

What is the direction of the gradient in gradient descent?

I understand in the gradient descent optimisation algorithm , in each iteration we need to take a step opposite to the direction of the gradient. In the above diagram the direction of the gradient vector is opposite to delta theta? Can somebody…
0
votes
0 answers

What is trivial step size vs non-trivial step size?

In the TRPO paper, the authors say we first prove that minimizing a certain surrogate objective function guarantees policy improvement with non-trivial step sizes. What is a non-trivial step size? 1e-2 , 1e-3? TRPO paper :…
0
votes
1 answer

Descent Direction

I am somewhat confused, for a gradient descent line-minimizer algorithm (backtracking), for which I need to compute the descent direction $p$. Now, I already computed the gradient $\nabla f$ for my function $f$, and was wondering in what way differs…
0
votes
1 answer

Gradient of next neighbor interaction

Is it correct to calculate the gradient of $$f(x) = \sum_{i} g(x_i, x_{i+1})$$ simply by saying: $$\nabla f(x) = (g(x_{i+1}), g(x_{i}))$$ Or is that completely wrong? (I assume it is) Thanks.
ReRed
  • 277
0
votes
1 answer

How exactly is the error backpropagated in backpropagation

I am reading a book on neural networks, and am now doing a chapter on backpropagation. (See chapter here). In this chapter, the writer is presenting four equations, that together form the backbone of the Backpropagation algorithm. In the second…
0
votes
0 answers

Why my gradient descent seems to diverge "pair-wise"?

Why my gradient descent seems to diverge "pair-wise"? I've checked the algorithms and they work for golden section line search and "small step parameter". However, when trying to get the algo to diverge, I notice that the results indicate that the…
mavavilj
  • 7,270
0
votes
2 answers

Can the directional derivative be considered a total derivative?

I guess I am having a little problem relating several concepts in one package. The partial derivative Vs. Total derivative. Total derivative appears to be a place holder where as the partial derivative assumes the other variable is fixed with…
Sedumjoy
  • 1,569
0
votes
1 answer

Minimizing $f(\vec{x})$ along a line via the line search algorithm

I'm trying to learn the conjugate gradient method, and I'm reading this article. The author presents the line search algorithm as the procedure where we choose a step size $\alpha$ that minimizes our function $f$ along a line. I understand the…
Ptheguy
  • 359