3

What is the Gradient and Hessian of this function?

$$ f(X)=\langle X,D\rangle-c \cdot \sqrt{\langle X,E\rangle}$$ where $X,D,E$ are all semi-definite matrices.

Where Gradient becomes zero?

user85361
  • 845

1 Answers1

3

The gradient is simple, at least if you have the matrix cookbook. $$\nabla f(X) = D - \frac{c}{2}\langle X, E\rangle^{-1/2} E$$ Assuming $D\neq 0$ and $E\neq 0$, it is clear that $\nabla f(X)=0$ is possible only if $D=\alpha E$ for some scalar $\alpha>0$. In that case, it is zero whenever $(c/2)\langle X,E\rangle^{-1/2}=\alpha$.

The Hessian isn't difficult either in concept---the challenge is writing it down. The Hessian of a vector function $g:\mathbb{R}^n\rightarrow\mathbb{R}$ can be represented by a symmetric matrix. But when you have a matrix function $f:\mathbb{R}^{m\times n}\rightarrow\mathbb{R}$, you can't represent it by a matrix anymore. Instead, the Hessian is a symmetric linear mapping. The best you can do, in my view, is look at the directional derivative. If $H$ is the search direction, then $$D^2f(X)[H,H] = \langle \nabla^2 f(X)[H],H\rangle = + \frac{c}{4} \langle X, E\rangle^{-3/2} \langle E, H \rangle^2.$$ Another way to look at it is that $\mathbb{R}^{m\times n}$ is isomorphic to $\mathbb{R}^{mn}$ via the vectorization function $\textbf{vec}$. If you define $$g:\mathbb{R}^{mn}\rightarrow\mathbb{R}, \quad g(x) \triangleq f({\textbf{vec}}^{-1}(x))$$ then $$\nabla g(x) = d - \frac{c}{2}(e^Tx)^{-1/2} e, \quad \nabla^2 g(x) = +\frac{c}{4}(e^Tx)^{-3/2} ee^T$$ where $d\triangleq\textbf{vec}(D)$ and $e\triangleq\textbf{vec}(E)$.

EDIT: The property that the matrices are semidefinite is largely irrelevant. However, it does ensure that $\langle X, E \rangle \geq 0$, so the function is well-defined over all of the desired values of $X$. Obviously, the function is not differentiable when $\langle X, E \rangle = 0$.

Michael Grant
  • 19,450
  • Thanks @Michael Grant for your excellent answer. Where can I read more about directional derivative and the way you computed Hessian? I don't understand what it means that H is search direction( used for Hessian). – user85361 Mar 09 '15 at 21:13
  • What I mean is this: define $h(t) = f(X+tH)$, where $t$ is a scalar and $H$ is now a fixed search direction. Then the directional derivatives are $$h'(0) = \langle \nabla f(X), H \rangle, \quad h''(0)=\langle \nabla^2 f(X)[H], H \rangle.$$ – Michael Grant Mar 09 '15 at 21:32
  • 1
    @MichaelGrant Hi Michael, could you please explicitly write the formula $D^2f(X)[H,H] = \langle \nabla^2 f(X)[H],H\rangle$? I guess it should represent the "quadratic form" on the matrix argument when the Hessian is a 4th order tensor. Is it correct? I have a similar question that I posted here link – yes Jul 10 '20 at 14:53