Questions tagged [gradient-descent]

"Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point."

Gradient descent is based on the observation that if the multi-variable function $F(x)$ is defined and differentiable in a neighborhood of a point $a$ , then $F(x)$ decreases fastest if one goes from $a$ in the direction of the negative gradient of $F$ at $a$, $-\nabla F(a )$. It follows that, if

$$a_{n+1}=a_n-\gamma \nabla F(a_n)$$

for positive $\gamma$ that is small enough, then $F(a_n) \ge F(a_{n+1})$.

1029 questions

vote

1 answer

Backtracking Line Search - Graphical Interpretation

I am new to convex optimization and got a little bit confused while reading up on the backtracking line search. What is $f(x) + \alpha t \nabla f(x)^T\Delta x$ ? I know of the Taylor first order expansion but this does not look like it.

gradient-descent

asked Mar 09 '18 at 10:37

Kong

vote

0 answers

Improving the convergence of gradient descent method

I am looking for methods to improve the gradient descent algorithm. In particular, my problem is that i'm trying to minimise the sum of squared errors between some observed data and the data produced by my model (based on non-linear differential…

gradient-descent

asked Dec 05 '17 at 14:34

Hikaru

vote

1 answer

why Clipped gradient can get stuck in a flat spot

I read the paper Towards Evaluating the Robustness of Neural Networks by Carlini there is a phrase: the algorithm Clipped gradient can get stuck in a flat spot where it has increased some component x_i to be substantially larger than the maximum…

gradient-descent

asked Jul 21 '17 at 15:29

Ivan Kush

votes

1 answer

Gradient of a loss function

Considering the following Loss function: $$A{L_t}\left( {{\mathbf{w}_t}} \right) = \sum\nolimits_{j = 1}^k {{L_t}\left( {{b^j}} \right)w_t^j}$$ I want to calculate the gradient of it ($\nabla A{L_t}\left( {{\mathbf{w}_t}} \right)$). The variables…

gradient-descent

asked Oct 07 '23 at 10:58

Shayan

votes

0 answers

Weight Vector calculation in CW-OGD strategy

I'm trying to understand an Online Portfolio Selection (OPS) strategy proposed by Zhang et al. (2021), named combination weights based on online gradient descent (CW-OGD). The algorithm is as follows: My question is about the 7th line of the…

gradient-descent

asked Oct 05 '23 at 17:48

Shayan

votes

2 answers

How to caculate the gradient of $f(x)=\frac{1}{2}x^TMx+b^Tx$?

A linear complementarity problem(LCP): $y=Mx+b$ $x_iy_i=0$ $x_i ≥0$, $y_i≥0$ $M=\left[ \begin{array}{ccc}0 & -A^T\\A & 0\end{array}\right]$ $x \in R^n$, $y\in R^n$ Problem: $f(x)=\frac{1}{2}x^TMx+b^Tx$, please calculate the gradient $\nabla…

gradient-descent

asked Jul 19 '22 at 15:30

Yusong Luo

votes

2 answers

How can I imagine / visualize gradient descent with many variables?

I can imagine something like above. But it works only when solving linear regression using gradient descent. How can I begin imagining when there are 4, 5, 6, ... 10000 variables? What does it even look like? Sorry, just a total beginner.

gradient-descent

asked Jan 23 '21 at 19:05

hey

votes

0 answers

What is the direction of the gradient in gradient descent?

I understand in the gradient descent optimisation algorithm , in each iteration we need to take a step opposite to the direction of the gradient. In the above diagram the direction of the gradient vector is opposite to delta theta? Can somebody…

gradient-descent

asked Dec 26 '19 at 07:48

Ayan Biswas

votes

0 answers

What is trivial step size vs non-trivial step size?

In the TRPO paper, the authors say we first prove that minimizing a certain surrogate objective function guarantees policy improvement with non-trivial step sizes. What is a non-trivial step size? 1e-2 , 1e-3? TRPO paper :…

gradient-descent

asked Nov 04 '19 at 04:04

Aastha Dua

votes

1 answer

Descent Direction

I am somewhat confused, for a gradient descent line-minimizer algorithm (backtracking), for which I need to compute the descent direction $p$. Now, I already computed the gradient $\nabla f$ for my function $f$, and was wondering in what way differs…

gradient-descent

asked Sep 19 '19 at 00:51

baconStrips

votes

1 answer

Gradient of next neighbor interaction

Is it correct to calculate the gradient of $$f(x) = \sum_{i} g(x_i, x_{i+1})$$ simply by saying: $$\nabla f(x) = (g(x_{i+1}), g(x_{i}))$$ Or is that completely wrong? (I assume it is) Thanks.

gradient-descent

asked Jul 02 '19 at 20:54

ReRed

votes

1 answer

How exactly is the error backpropagated in backpropagation

I am reading a book on neural networks, and am now doing a chapter on backpropagation. (See chapter here). In this chapter, the writer is presenting four equations, that together form the backbone of the Backpropagation algorithm. In the second…

gradient-descent

asked Jun 28 '19 at 08:25

Psychotechnopath

votes

0 answers

Why my gradient descent seems to diverge "pair-wise"?

Why my gradient descent seems to diverge "pair-wise"? I've checked the algorithms and they work for golden section line search and "small step parameter". However, when trying to get the algo to diverge, I notice that the results indicate that the…

gradient-descent

asked Jan 22 '19 at 15:11

mavavilj

7,270

votes

2 answers

Can the directional derivative be considered a total derivative?

I guess I am having a little problem relating several concepts in one package. The partial derivative Vs. Total derivative. Total derivative appears to be a place holder where as the partial derivative assumes the other variable is fixed with…

gradient-descent

asked May 26 '18 at 19:45

Sedumjoy

1,569

votes

1 answer

Minimizing $f(\vec{x})$ along a line via the line search algorithm

I'm trying to learn the conjugate gradient method, and I'm reading this article. The author presents the line search algorithm as the procedure where we choose a step size $\alpha$ that minimizes our function $f$ along a line. I understand the…

gradient-descent

asked May 23 '18 at 23:33

Ptheguy

Prev 1

3 Next