3

I was going through the equations of Backpropagation in Andrew Ng's Deep Learning course and I got these set of equations for a two layer Neural Network:

$dZ^{[2]} = A^{[2]} - y$

$dW^{[2]} = 1 / m \space\space dZ^{[2]}\space A^{[1]T}$

$dZ^{[1]} = W^{[2]T} dZ^{[2]} g^{[1]'}(Z^{[1]})$

$dW^{[1]} = 1/m \space\space dZ^{[1]} \space X^{[T]}$

Where

$A^{[i]}$ is the activation values for the $i^{th}$ layer.

$y$ is the target value.

$Z^{[i]}$ is the input for the $i^{th}$ layer.

$W^{[i]}$ is the weight between the $i^{th}$ layer and the $(i-1)^{th}$ layer.

$g^{[i]}()$ is the activation function for the $i^{th}$ layer.

$X$ is the input to the neural network.

I've intentionally ignored the bias terms to make it simpler.

I do understand that the first equation represents the error in the last layer, the second equation is derived from ${\space}{\partial}E^{[2]}/{\partial}W^{[2]}\,$ when $E = - (1/m {\space}[y \log a^{[2]} + (1 - y) \log(1 - a^{[2]})])$ given the activation function is a sigmoid function.

I would like to know the formal derivation for the third equation.

BDN
  • 624
Ashu
  • 31
  • The third equation is the backpropagation of the error to the previous layer in order to calculate $\partial{E}/\partial{W^{[1]}}$ using the fourth equation. – BDN Jun 25 '18 at 09:00
  • But is there any formal derivation to it given the first two equations and the cost function? – Ashu Jun 25 '18 at 12:18
  • The formal derivation is to take the equations used to calculate the output of the ANN and derive the partial derivatives of the output w.r.t. the synaptic weights. – BDN Jun 26 '18 at 10:07
  • This may be helpful http://neuralnetworksanddeeplearning.com/chap2.html – David Jul 19 '19 at 11:55

0 Answers0