2

I have a vector $\mathbf{x}$ (10 $\times$ 1), a vector $\mathbf{f}(\mathbf{x})$ (10 $\times$ 1) and an invertible square matrix $\mathbf{Z}(\mathbf{x})$ (10 $\times$ 10) which is block diagonal:

$$\mathbf{Z}(\mathbf{x}) = \begin{bmatrix}\mathbf{A} & 0\\\mathbf{C} & \mathbf{B}\end{bmatrix}$$

$\mathbf{A}$ is 4x4, $\mathbf{B}$ is 6x6 diagonal, $\mathbf{C}$ is 6x4 with columns 3 and 4 null.

I need to compute (numerically) $\Delta(\mathbf{x})$

$$\Delta(\mathbf{x}) = \frac{\partial(\mathbf{Z}^{-1}\mathbf{f})}{\partial \mathbf{x}}$$

I came up with the following:

$$\Delta(\mathbf{x}) = \frac{\partial(\mathbf{Z}^{-1})}{\partial \mathbf{x}}\mathbf{f}+ \mathbf{Z}^{-1}\frac{\partial\mathbf{f}}{\partial \mathbf{x}}$$

The second term is trivial, but I can't compute $\frac{\partial(\mathbf{Z}^{-1})}{\partial \mathbf{x}}$.
I've mixed matrix-by-vector derivative and inverse derivative (I'm not sure it is legal, however) and got this:

$$\frac{\partial(\mathbf{Z}^{-1})}{\partial \mathbf{x}} = -\mathbf{Z}^{-1}\frac{\partial\mathbf{Z}}{\partial \mathbf{x}}\mathbf{Z}^{-1}$$

The middle term is a $n \times 1$ vector of $n \times n$ matrices, according to Wikipedia.

How am I supposed to multiply a vector of matrices and two matrices in order to obtain a $n \times n$ matrix?

And if the last deduction is wrong, how to compute $\frac{\partial(\mathbf{Z}^{-1})}{\partial \mathbf{x}}$?

Astrinus
  • 131

2 Answers2

2

$\def\v{{\rm vec}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$Assume the following gradients are known $$\eqalign{ J &= \p{f}{x},\qquad {\mathcal H} = \p{Z}{x} \\ }$$ Note that $(J,{\mathcal H})\,$ are $\,(2^{nd},\,3^{rd})$ order tensors, respectively.

Flattening $(Z,{\cal H})$ makes them much easier to work with $$z=\v(Z),\qquad H = \p{z}{x}$$ Let's give the objective function a convenient name $$w = Z^{-1}f$$ and find its differential and gradient $$\eqalign{ dw &= Z^{-1}\,df + dZ^{-1}\,f \\ &= Z^{-1}J\,dx - Z^{-1}\,dZ\,Z^{-1}f \\ &= Z^{-1}J\,dx - Z^{-1}\,dZ\,w \\ &= Z^{-1}J\,dx - \left(w^T\otimes Z^{-1}\right)dz \\ &= \Big(Z^{-1}J - \left(w^T\otimes Z^{-1}\right)H\Big)\,dx \\ \p{w}{x} &= Z^{-1}J - \left(w^T\otimes Z^{-1}\right)H \\ }$$ where $\otimes$ denotes the Kronecker product which is required to vectorize a matrix expression.

The problem with the accepted answer is that it assumes a product rule for the gradient with respect to a vector, which turns out to be false, i.e. $$\eqalign{ \frac{\partial (My)}{\partial x} \ne \bigg(\frac{\partial M}{\partial x}\bigg)y + M\bigg(\frac{\partial y}{\partial x}\bigg) \cr }$$ For differentials however, the analogous rule is valid $$d(My) = \big(dM\big)\,y + M\,\big(dy\big) $$ The underlying issue is that taking the gradient of a variable changes its tensorial character, while taking its differential does not.

For example, $\frac{\partial y}{\partial x}$ is a matrix whereas $dy$ is just a vector.

Similarly, $\frac{\partial M}{\partial x}$ is a third-order tensor while $dM$ is a matrix.

greg
  • 35,825
-1

I've found that the solution is much more simple than I thought.

Consider the whole term $\frac{\partial(\mathbf{Z}^{-1})}{\partial \mathbf{x}}\mathbf{f} = -\mathbf{Z}^{-1}\frac{\partial\mathbf{Z}}{\partial \mathbf{x}}\mathbf{Z}^{-1}\mathbf{f}$, with the vector $\mathbf{f}$ too; the derivative term in the right member has the form

$$ \frac{\partial\mathbf{Z}}{\partial \mathbf{x}} = \begin{bmatrix}\frac{\partial\mathbf{Z}}{\partial x_1} & \frac{\partial\mathbf{Z}}{\partial x_2} & \frac{\partial\mathbf{Z}}{\partial x_3} & \cdots\end{bmatrix}$$

whose elements are $n \times n$ matrices.

Now, swap multiplications and vector-of-matrices:

$$\frac{\partial(\mathbf{Z}^{-1})}{\partial \mathbf{x}}\mathbf{f}= \begin{bmatrix}-\mathbf{Z}^{-1}\frac{\partial\mathbf{Z}}{\partial x_1}\mathbf{Z}^{-1}\mathbf{f} & -\mathbf{Z}^{-1}\frac{\partial\mathbf{Z}}{\partial x_2}\mathbf{Z}^{-1}\mathbf{f} & -\mathbf{Z}^{-1}\frac{\partial\mathbf{Z}}{\partial x_3}\mathbf{Z}^{-1}\mathbf{f} & \cdots\end{bmatrix}$$

These elements are column vectors, each equal to $\frac{\partial(\mathbf{Z}^{-1})}{\partial x_i}\mathbf{f}$, and their concatenation yields a $n \times n$ matrix, summable with the other term.

I'm still not sure if this is true for $n \gt 3$, so if someone could point out pitfalls I have run in, I'll be grateful.

Astrinus
  • 131