1

Let $\boldsymbol{X}$ be a $n \times p$ matrix and $\boldsymbol{\beta}$ a $p-$dimensional vector. I'd like to calculate

$$ \frac{\partial f(\boldsymbol{X\beta})}{\partial\boldsymbol{\beta}} $$

I tried

$$ f'(\boldsymbol{X\beta}) \boldsymbol{X} $$

but, obviously, the dimensions are not correct.

  • why is the codomain of $f$? Your expression seems right if $f$ is a functional – Masacroso May 12 '19 at 14:41
  • Real greater than zero. – Wagner Jorge May 12 '19 at 14:46
  • My CAS says $$ \boldsymbol{X} f'(\boldsymbol{X} \boldsymbol{\beta})$$ So you had it almost, except for the pre-multiplying part. – John Alexiou May 12 '19 at 22:26
  • 1
    @ja72 it depends how you understand matrix multiplication. If matrix multiplication is understood as $vM$, here $v$ is a vector and $M$ a matrix, then your result follows. However it is more common to have matrix multiplication as $Mv$, then my result follow. In general, not using matrix representation, the correct answer is $\partial f(X\beta)X$ because chain rule is defined from left to right, because after all is just a composition of functions, what CAS had you used? – Masacroso May 13 '19 at 13:58
  • I disagree, since in my notation it is always matrix-vector. $\boldsymbol{f}$ and $\boldsymbol{f'}$ are vector functions and $\boldsymbol{X}$ is a matrix. Where is the chain rule defined left ot right? Why is $(f(g(x)))' = g'(x),f'(g(x))$ incorrect? – John Alexiou May 13 '19 at 16:20
  • @ja72 I explained in my previous comment. To make it more clear to you took a book of multivariable calculus – Masacroso May 13 '19 at 20:09

2 Answers2

2

Take an ordinary scalar function $\phi(z)$ and its derivative $\phi'(z)=\frac{d\phi}{dz}$ and apply them element-wise to a vector argument, i.e. $$\eqalign{ v &= X\beta,\quad f &= \phi(v),\quad f' &= \phi'(v) \cr }$$ The differential of such a vector function can be expressed using an elementwise $(\odot)$ product or better yet, a Diagonal matrix $$\eqalign{ df &= f'\odot dv \cr &= {\rm Diag}(f')\,dv \cr &= {\rm Diag}(f')\,X\,d\beta \cr }$$ Given this differential, the gradient with respect to $\beta$ can be identified as the matrix
$$\eqalign{ \frac{\partial f}{\partial \beta} &= {\rm Diag}(f')X \cr\cr }$$ An example of the equivalence of Hadamard product and diagonalization: $$\eqalign{ &a = \pmatrix{a_1\\a_2},\quad &b = \pmatrix{b_1\\b_2},\quad &a&\odot&b = \pmatrix{a_1b_1\\a_2b_2} = b\odot a \cr &A = {\rm Diag}(a) = &\pmatrix{a_1&0\\0&a_2},\quad &&A&b = \pmatrix{a_1b_1\\a_2b_2} \cr &B = {\rm Diag}(b) = &\pmatrix{b_1&0\\0&b_2},\quad &&B&a = \pmatrix{a_1b_1\\a_2b_2} \cr }$$

greg
  • 35,825
  • I'm not mathematician, can you help me with the elementwise product? Specifically $df = f' \odot dv$. Why can we transform in $diag(f')$. – Wagner Jorge May 12 '19 at 19:28
1

You have that, as you wrote

$$\partial[f(X\beta)]=\partial f(X\beta) X$$

for $f:\Bbb R^n\to[0,\infty)$ and $X:\Bbb R^p\to\Bbb R^n$. Then $\partial f(X\beta)$ can be represented by the gradient $\nabla f(X\beta)$, that it is a vector on $\Bbb R^n$ and $\nabla f(X\beta)X$ is a vector on $\Bbb R^p$, that is the gradient of $f\circ X$ in $\beta$, hence

$$\partial f(X\beta) Xh=\nabla f(X\beta)X\cdot h=\nabla(f\circ X)(\beta)\cdot h$$

for any $h\in\Bbb R^p$, where the dot is the euclidean dot product.

Masacroso
  • 30,417