Optimization problem: Computing the gradient

Question

I need help with the following exercise:

Solve $\min_{x\in\mathbb{R}^d} f(x)$, where $f:\mathbb{R}^d\to\mathbb{R}$. We define the inner product $(v,w)_A:= v^TAw$, induced by a positive definite and symmetric matrix $A\in\mathbb{R}^{d \times d}$.

First compute the gradient $\nabla^Af(x)$ with respect to the inner product defined by $(\nabla^Af(x),v)_A=\partial f(x)(v) \forall v\in\mathbb{R}^d$

Ok, so I´m not sure if I have to solve the following equation $(\nabla^Af(x),v)_A=(\nabla^Af(x))^TAv=\partial f(x)(v)$.If yes how exactly? Or is there an easier way?

You don't have to solve that equation; that is the definition of $\nabla^A f$. In order to do the minimization, you have to solve for $\nabla^A f=0$. — Federico, Dec 06 '18 at 15:09

Federico · Accepted Answer · 2018-12-06T16:09:21.693

Let's make it clear. Given $f\in C^1(\mathbb R^n;\mathbb R^k)$, the derivative, total derivative, Fréchet derivative, differential, total differential, pushforward, however you call it, of $f$ at a point $x$ is the linear map $$ df_x=Df_x=f'(x)=\frac{df}{dx}:\mathbb R^n\to\mathbb R^k $$ such that $$ f(y)-f(x)-df_x(y-x) = o(|y-x|) . $$ In the standard basis of $\mathbb R^n$ and $\mathbb R^k$, the derivative is represented by the Jacobian matrix $J_f(x)$.

In the case of a scalar function, $k=1$ and the derivative is a linear map $df_x:\mathbb R^n\to\mathbb R$, that is, an element of the dual space $(\mathbb R^n)^*$.

If you fix a non-degenerate bilinear form $B:\mathbb R^n\times\mathbb R^n\to\mathbb R:(v,w)\mapsto B(v,w)$, then for every linear functional $\phi\in(\mathbb R^n)^*$ there is a vector $v_\phi\in\mathbb R^n$ such that $\phi(w)=B(v_\phi,w)$ for all $w\in\mathbb R^n$.

The gradient is the vector that represents the derivative with respect to a chosen non-degenerate form.

In our case, the linear functional that we want to represent is the derivative $df_x$ and the bilinear form is $B(v,w)=\langle Av,w\rangle=v^TAw$, where $A$ is a symmetric positive definite matrix and $\langle\,\cdot\,,\,\cdot\,\rangle$ is the standard scalar product.

If $\nabla f(x)=J_f(x)$ denotes the standard gradient (the one with respect to the standard scalar product), then $$ \nabla^A f(x) = A^{-1}\nabla f(x) $$ is the vector you are looking for. In fact, $$ B(\nabla^Af(x),w)=\langle A\nabla^Af(x),w\rangle =\langle AA^{-1}\nabla f(x),w\rangle =\langle \nabla f(x),w\rangle = df_x(w). $$

Can the downvoter please explain what exactly he doesn't understand of this laborious, in-depth and accurate answer? — Federico, Dec 06 '18 at 19:29

Optimization problem: Computing the gradient

1 Answers1