Gradient vector and Hessian matrix of function containing vector of complex exponentials

Question

I would like to obtain the gradient vector and Hessian matrix for the following function:

$$f(\boldsymbol{\theta}) = \mathrm{real}\{u(\boldsymbol{\theta})^H\Gamma u(\boldsymbol{\theta})\}$$

where $u(\boldsymbol{\theta}) = \boldsymbol{u} = [e^{j\theta_1}, e^{j\theta_2}, \cdots, e^{j\theta_K}]^H$, $\boldsymbol{\theta} = [\theta_1, \theta_2, \cdots, \theta_K]^T \in \mathbb{R}^{K \times 1}$, $\Gamma \in \mathbb{C}^{K \times K}$, and $j = \sqrt{-1}$.

From my low level of understanding, the gradient vector should look something like:

$$\boldsymbol{g} = g(\boldsymbol{\theta}) = \frac{\partial}{\partial \boldsymbol{\theta}} (\boldsymbol{u}^{H}\Gamma \boldsymbol{u}) = \frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}} \Gamma \boldsymbol{u} + \frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}} \Gamma^H \boldsymbol{u} = \frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}} \hat{\Gamma} \boldsymbol{u}$$

where $\hat{\Gamma} = \Gamma + \Gamma^H$.

If this is correct, then $\frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}} \in \mathbb{C}^{K \times K}$ for $\boldsymbol{g}$ to be a $K \times 1$ vector. Is $\frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}}$ therefore a diagonal matrix with $i$th diagonal entry equal to $\frac{\partial \boldsymbol{u}}{\partial \theta_i} = -je^{-j\theta_i}$?

The Hessian matrix should then be given by

$$\boldsymbol{H} = H(\boldsymbol{\theta}) = \frac{\partial }{\partial \boldsymbol{\theta}^T}(\frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}} \hat{\Gamma} \boldsymbol{u}) = \frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}}\hat{\Gamma} \frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}} + \mathrm{diag}\{\boldsymbol{u}^H\hat{\Gamma}^H\frac{\partial^2 \boldsymbol{u}}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\}$$

with $\frac{\partial^2 \boldsymbol{u}}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}$ equal to a $K \times K$ diagonal matrix with $i$th diagonal entry equal to $\frac{\partial^2 \boldsymbol{u}}{\partial {\theta}_i \partial {\theta}_i} = -e^{j\theta_i}$. Here I believe the $\mathrm{diag}\{\cdot\}$ function is required to convert the $1 \times K$ vector $\boldsymbol{u}^H\hat{\Gamma}^H\frac{\partial^2 \boldsymbol{u}}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}$ into a $K \times K$ matrix; however, I am not confident about this.

The above equations produce $\boldsymbol{g} \in \mathbb{C}^{K \times 1}$ and $\boldsymbol{H} \in \mathbb{C}^{K \times K}$. Given that both $f(\boldsymbol{\theta})$ and $\boldsymbol{\theta}$ are real valued, should some modifications be made to $\boldsymbol{g}$ and $\boldsymbol{H}$ to produce a real vector and matrix, respectively? (For example, by taking only the real or imaginary component of $\boldsymbol{g}$ and $\boldsymbol{H}$.)

Through trial and error, I have been able to use the following in an implementation of Powell's "dogleg" method to achieve a reasonable level of performance (i.e., minimisation of $f(\boldsymbol{\theta})$ with varying degrees of success):

$$ \hat{\boldsymbol{g}} = -\mathrm{real}\{\frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}} \hat{\Gamma} \boldsymbol{u}\} $$

$$ \hat{\boldsymbol{H}} = -\mathrm{real}\{\frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}} \hat{\Gamma} \frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}}\} + \alpha\boldsymbol{I}_K $$

where $\alpha$ is very small and $\boldsymbol{I}_K$ is a $K \times K$ identity matrix. I have found that $\mathrm{diag}\{\boldsymbol{u}^H\hat{\Gamma}^H\frac{\partial^2 \boldsymbol{u}}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\}$ is of lower significance than $\frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}}\hat{\Gamma} \frac{\partial \boldsymbol{u}}{\partial \boldsymbol{\theta}}$, and that the addition of $\alpha\boldsymbol{I}_K$ aids with inversion of $\hat{\boldsymbol{H}}$.

Any help with this would be greatly appreciated. Even a confirmation that I am close to (or nowhere near) the correct solution would be fantastic.

Thanks for reading.

greg · Accepted Answer · 2018-04-01T00:28:04.783

For ease of typing, I'll use Latin in place of your Greek letters, $$A = \Gamma, \,\,\,\,x = \theta $$ I'll also represent the trace/Frobenius and elementwise/Hadamard products by $$\eqalign{ \alpha &= B:C &\implies \alpha = {\rm tr}(B^TC) \cr A &= B\odot C &\implies A_k = B_k C_k \cr }$$ and the transpose, complex and hermitian conjugate of $A$ by $\{A^T, A^*, A^H\}$ respectively.

The function and its differential are $$\eqalign{ f &= \frac{1}{2}\big(Au:u^* + A^*u^*:u\big) \cr df &= \frac{1}{2}\big(A\,du:u^* + Au:du^* + A^*\,du^*:u + A^*u^*:du\big) \cr &=(Bu)^*:du + (Bu):du^* \cr }$$ where $B = \frac{1}{2}(A+A^H)$ is the hermitian component of $A$.

Next we need the differential of the exponential function applied elementwise to a vector $$\eqalign{ u &= \exp(jx) \cr du &= u\odot d(jx) = (ju)\odot dx \cr }$$ Substituting this, we can continue on to find the gradient with respect to $x$ $$\eqalign{ df &= (Bu)^*:(ju\odot dx) + (Bu):(ju\odot dx)^* \cr &= ju\odot(Bu)^*:dx + (ju)^*\odot(Bu):dx \cr &= (VB^*u^* + V^*Bu):dx \cr g=\frac{\partial f}{\partial x} &= VB^*u^* + V^*Bu \cr }$$ where $V={\rm Diag}(ju)$. Being of the form $(y+y^*)$, we see that $g\in{\mathbb R}^K$ as expected.

At this point, you should pause and consider using Conjugate Gradients, or Barzilai-Borwein, or some other method that does not require the Hessian, since that is going to be complicated.

Also, instead of insisting on a matrix product involving $V={\rm Diag}(ju)$, you should just write code for a Hadamard product with the underlying vector.

Let $$\eqalign{ W &= {\rm Diag}(jBu) \cr L &= (V^*B+W^*)V \cr }$$ then the Hessian is $$H=\frac{\partial^2f}{\partial x\partial x^T}=L+L^*$$

This is great! Is there an easy extension of this to $u = r e^{j \theta}$ where $x = (r, \theta)$ now, and we want the Hessian w.r.t. $r$ and $\theta$? — jjjjjj, Mar 10 '19 at 04:19
I tried to write up my Q more clearly here: https://math.stackexchange.com/questions/3141990/mathbbc-mathbbr-calculus-for-quadratic-form — jjjjjj, Mar 10 '19 at 05:40

Gradient vector and Hessian matrix of function containing vector of complex exponentials

1 Answers1

Linked