Backpropagation in recurrent neural networks

Question

How do recurrent neural networks share weights ? I have been reading it online but I cant figure out how it does this. Particularly because during backpropagation,the hidden cell at e.g. t=2 would receive the gradients coming from t=3. So both now the weights of both cells will be updated differently.

For example, after 1 iteration, will the weights w1,w2,w3,w4,w5 be similar ? Because w4 would have gradients coming from its output as well as the output from w5 whereas w5 would only have the gradient coming from its output.

score 2 · Accepted Answer · answered Oct 20 '17 at 18:51

2

Unroll the recurrent neural netwoork to obtain a layered non-recurrent neural network.
Run it with your input, compute as usual (using backpropagation) the error and the gradient with respect to the weights $\frac{\partial E}{\partial W_{i,j}(t)}$,
Finally update the weights $W_{i,j}$ of your recurrent neural network with $$W_{i,j} \leftarrow W_{i,j} - \eta \sum_t \frac{\partial E}{\partial W_{i,j}(t)}$$

answered Oct 20 '17 at 18:51

reuns

77,999

thank you for the reply. I have included a figure in my post. So based on what you have said, the recurrent layers will not have similar weights after 1 iteration ? That I can essentially treat it as a layered non-recurrent neural network as you have said. – Kong Oct 20 '17 at 19:53
1

@kong You need to understand the unroll step which sets $W_{i,j}(t) = W_{i,j}$ for every $t$. At the end we update $W_{i,j}$ (equivalently each $W_{i,j}(t)$ are updated the same way) so $W_{i,j}(t) = W_{i,j}$ always stays true. What depends on $t$ is $\frac{\partial E}{\partial W_{i,j}(t)}$ – reuns Oct 20 '17 at 19:56
So w1=w2=w3=w4=w5. During backprop, all gradients that lead to w1 will be summed. – Kong Oct 20 '17 at 20:19
1

@kong Yes exactly. And I didn't mention a main issue : what $t_{max}$ do you choose, and how do you define the error in the unrolled network (do you look only at the output of the last layer, or do you sum the error of the output of the last few layers) – reuns Oct 20 '17 at 20:29
Thank you again for your help. I now understand that the things you mention are design choices one must make when implementing backpropagation through RNN since it affects the update of W. – Kong Oct 20 '17 at 20:37
1

@konw And it affects how you will read the output from the RNN – reuns Oct 20 '17 at 20:38

Backpropagation in recurrent neural networks

1 Answers1