1

I was looking at back propagating a gradient through a computational graph, and all makes sense aside from when a node has multiple outputs. Take the following function:

$$f(x) = (x+3)(x+2)$$

Which obviously has the derivative:

$$f'(x) = 2x+5$$

Now, take the following segment of a computational graph:

graph 1

The numbers above a line represent the value on a forward pass, the numbers beneath it represent the back propagated gradient.

When moving the gradient backwards through the $f(x)$ node, I took the sum of $0.34$ and $-0.2$, then multiplied this sum by the derivative of $f(x)$, with an input of $2$ (taken from the forward pass). So: $(0.34-0.2)\times f'(2) = 1.26$. I understand I had to multiply the $f'(x)$ (following the chain rule), but I do not understand why I had to sum the $0.34$ and $-0.2$. I only did so, because I know that's what I'm meant to do.

Any help is greatly appreciated.

Recessive
  • 307
  • Your example does not make sense or the graph is incomplete. $f$ only has one output. The situation that you seem do describe happens at the $x$ node where the value is used in the two factors. – Lutz Lehmann May 28 '19 at 09:58
  • Yes, it's a segment as I said. An example function where this might appear would be something like $g(x)=0.34f(x^2)-0.2f(x^2)$. Obviously the $f(x^2)$ can be factorised out, but this isn't always the case, hence where you may end up with the case above. – Recessive May 28 '19 at 10:02
  • Is this from neural networks? – Rodrigo de Azevedo Sep 26 '20 at 07:46

2 Answers2

2

In a computation like $$ f(x)=a(u(x))\cdot b(u(x)) $$ you can derive the gradient rules by starting with the universal relation of automatic/algorithmic differentiation between gradients $\bar f, \bar b,..$ and directional derivatives $\dot x,\dot u,..$ which are $$ \bar f\dot f = \bar a\dot a+\bar b\dot b=\bar u\dot u=\bar x\dot x $$ Now you can insert the chain rule and evaluate it in both directions of the identity chain. $\dot f=\dot ab+a\dot b$, $\dot a = a'(u)\dot u$, $\dot b = b'(u)\dot u$ so that $$ \bar f\dot ab+\bar fa\dot b= \bar a\dot a+\bar b\dot b $$ which has to be true for all direction vectors implying $\bar a=\bar f b$, $\bar b=\bar f a$ and $$ \bar aa'(u)\dot u+\bar bb'(u)\dot u=\bar u\dot u $$ implying $\bar u = \bar aa'(u)+\bar bb'(u)$.

Lutz Lehmann
  • 126,666
  • I get a 403 error for the link. Forbidden – Recessive May 29 '19 at 03:35
  • Should work now. I was so used that the links to the base without "www." work too that I did not check. Other sources like the wikipedia page or even books (Griewank et al., Naumann) should help too. – Lutz Lehmann May 29 '19 at 06:06
  • Thankyou for your answer, I must admit that it's a bit beyond my understanding of the topic, so I can't in good faith mark it as an answer, as I'm not sure what it means. I think it was more my mistake of assuming it was a trivial thing I was missing, but it seems there's more to it then that. – Recessive May 29 '19 at 23:49
0

A 1-input and 2-output gate is equivalent to two 1-input and 1-output gates, as the output functions are the same [$f(x)=(x+3)(x+2)$ in this case].

Now, let the gradient flowing into each of the gates be $g_1$ and $g_2$. The gradient update for each inputs will be $f'(x) * g_1$ and $f'(x) * g_2$.

However, each inputs are the same. Hence, it would be the same if you did $$x = x + f'(x) * g_1$$ $$x = x + f'(x) * g_2$$ or just $$x = x + f'(x) * (g_1 + g_2)$$ at once.

Moltres
  • 101