1

To preface the question that I actually want an answer to, I've read the paper by Nicholas J. Higham that computes the square root of a matrix via the Newton's method. He utilises the function $F(X) = X^2 - A$, where X is the unknown matrix and A is the matrix whose root is being calculated. He then computes the Frechet derivative of this matrix (assuming the matrices belong to $C^{n * n}$). He does this simply by computing $F(X+H) = (X+H)^2 - A$, expanding the terms and then equating them to the Taylor series expansion upto second order accuracy of F(X+H), which is $$F(X+H) = X^2 - A + F^\prime(X).H + H^2$$ where H is a matrix of very small perturbations in order to compute the infinitesimal decrement in the value of $F(X)$. From the expansion of $F(X+H)$ he obtains that $F^\prime(X).H = X.H + H.X$. Now to my question.

I have a matrix equation which involves a mix of matrix products and Hadamard products in some of the terms. Say an equation of the form $$F(X) = X + (X.A)\odot X$$ I am allowed to do a Frechet derivative on this. If I perform an expansion as Higham did in his paper, I will have some terms in the Frechet derivative that will have the Hadamard product associated with them as properties will carry over. However, if I actually calculate the Frechet derivative as the Jacobian, I realise that the terms have become linearised and will not have this nonlinear mixed operator term. However, computing the Jacobian leads to a 4 dimensional tensor. I have no idea how this could be represented as any sort of matrix equation. Any help here would be appreciated. (For making things a bit simpler, in my case matrices contain only real numbers)

1 Answers1

1

The differential can always be written in terms of the gradient $$\eqalign{ dF &= F':dX \\ }$$ It can also be calculated via first-order perturbation of the functional form $$\eqalign{ F &= X + X\odot(XA) \\ dF &= dX + dX\odot(XA) + X\odot(dX\;A) \\ }$$ Equating the expressions and changing $\,dX\to H\,$ yields $$\eqalign{ \def\LR#1{\left(#1\right)} \def\vc{\operatorname{vec}} \def\Diag{\operatorname{Diag}} \def\D{\operatorname{Diag}} \def\A{A^T\otimes I} F':H &= H + (XA)\odot H + X\odot(HA) \\ }$$

Update

First, vectorize individual terms $$\eqalign{ h &= \vc(H),\qquad f=\vc(F),\qquad x=\vc(X) \\ B &= (\A),\quad \vc(HA) = Bh,\quad \vc(F':H) = Jh \\ }$$ then the whole equation $$\eqalign{ Jh &= h + (Bx)\odot h + x\odot(Bh) \\ }$$ Replace Hadamard products with diagonal matrices $$\eqalign{ Jh &= I\cdot h + \D(Bx)\cdot h + \D(x)\cdot Bh \\ &= \LR{I + \D(Bx) + \D(x)\cdot B} h \\ }$$ and isolate the Jacobian $$\eqalign{ \def\E{{\cal E}} \def\M{{\mathbb M}} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} J &= I + \D(Bx) + \D(x)\cdot B \\\\ }$$


If you are comfortable with tensors, then you can use the $6^{th}$ order $\M$ tensor from this post and the $4^{th}$ order identity tensor $\E$ from this post to calculate the tensor-valued gradient like so $$\eqalign{ dF &= dX + dX\odot(XA) + X\odot(dX\;A) \\ &= \E:dX + \LR{XA}:\M:dX + X:\M:\LR{\E\cdot A^T:dX} \\ &= \LR{\E + \LR{XA}:\M + X:\M\cdot A^T} : dX \\ \grad{F}{X} &= \E + \LR{XA}:\M + X:\M\cdot A^T \\ }$$ The elements of this tensor are identical to those of the $J$ matrix, but reshaped from a $(n^2\times n^2)$ matrix into a $(n\times n\times n\times n)$ tensor.

So the gradient requires ${\cal O}(n^4)$ of storage whether it is represented as a matrix or a tensor. When $n$ is large, this will be a problem.

greg
  • 35,825
  • Apologies if I was not very clear in presenting what I wanted. What I actually want is to obtain a form of $F^\prime$ that is a matrix equation in H and X and whatever other matrices are relevant, but does not contain any Hadamard product operators. I honestly do not know if that is impossible or not though. However, thank you for your answer, as I got to learn something about the gradient representation. – Robby Ram Jul 07 '23 at 09:45
  • I am adding a new comment solely so that what I want as an answer is clearly presented in the previous comment. Thank you for your answer. This works for me. However, if the matrices are of very large dimensions, say 10000 x 10000, I believe this would become computationally expensive right? To work with Kronecker products? – Robby Ram Jul 07 '23 at 12:16
  • This is a comment to the most recent edit you provided describing the tensor representation. Thank you for providing a clear picture on what I was trying to grasp at. I guess this is one of those problems that will require very different algorithms to solve based on the sizes of the matrices involved. – Robby Ram Jul 07 '23 at 14:03
  • 1
    If you are trying to solve this specific problem, note that it has a simple algebraic solution $$\eqalign{ X = -NA^{-1} \quad\implies\quad &X\odot(XA) = X\odot(-N) = -X \ &F= X + X\odot(XA) = X-X = 0 \ }$$ where $N$ is the all-ones matrix. – greg Jul 07 '23 at 15:16
  • While the problem I am trying to solve is more complex, thank you. This might help in the long run in solving part of my problem. – Robby Ram Jul 07 '23 at 20:01