0

$\frac{d}{d\mathbf{w}}Xsig(X^T\mathbf{w})=Xdiag(sig(X^T\mathbf{w})\odot (1-sig(X^Tw)))X^T$

I want to know the step of getting the above result, thank you. The $sig$ is the element-wise sigmoid function. The convention is denominator layout.

7337dtd
  • 49

1 Answers1

0

The derivative of the logistic function, $\,s={\rm sig}(y),\,$ considered as an ordinary (scalar) function is $$\frac{ds}{dy} = (s-s^2) \quad\implies\quad ds = (s-s^2)\,dy$$ Applying the function element-wise to a vector argument $\;(y=X^Tw)\;$ yields vector results $$\eqalign{ s &= {\rm sig}(y) \\ ds &= (s-s\odot s)\odot dy \\ }$$ The elementwise/Hadamard products can be replaced by multiplication with a diagonal matrix, i.e. $$\eqalign{ S &= {\rm Diag}(s) \\ ds &= (S-S^2)\,dy \\ }$$ Let's apply the above to the function in the question and calculate its gradient. $$\eqalign{ f &= Xs \\ df &= X\,ds \\ &= X(S-S^2)\,dy \\ &= X(S-S^2)X^T\,dw \\ &= XS(I-S)X^T\,dw \\ \frac{\partial f}{\partial w} &= XS(I-S)X^T \\ }$$ So the gradient expression in the question is wrong.

greg
  • 35,825
  • Hi, thank you so much for the answer, could you explain why the middle term $S(I-S)\neq diag(sig(X^T\mathbf{w})\odot(1-sig(X^T\mathbf{w})))$ if I correct the bold symbol in my expression, I think that is a typo – 7337dtd Nov 03 '20 at 13:57
  • @7337dtd You're right, $,S(I-S)={\rm Diag}\Big(s\odot({\tt1}-s)\Big),,$ it just looks awkward. – greg Nov 03 '20 at 14:51
  • sure thank you. – 7337dtd Nov 03 '20 at 15:09