0

I am currently trying to solve exercise 1.1 from Bishop's book Pattern Recognition and Machine Learning.

The exercise requires me to substitute $$y(x,\mathbf w) = \sum_{j=0}^M w_jx^j$$ into $$E(\mathbf w) = \frac{1}{2}\sum_{n=1}^N \{y(x_n,\mathbf w) -t_n\}^2 $$

and then differentiate with respect to $w_i$ and set to zero which leads to $$ \sum_{n=1}^N \biggl( \sum_{j=0}^M w_j x_n^j -t_n \biggr) x_n^i = 0$$

I can't figure out how to differentiate this, especially $y(x,\mathbf w)$, and where the sudden index $i$ in the result comes from.

Thx for help.

Suzu Hirose
  • 11,660
Pascal
  • 3

1 Answers1

1

Keep in mind that you're only differentiating with regards to a single weight, and not the entire weights vector. Therefore, $$\frac{\partial y}{\partial w_i}=x^i$$ because all but one term is a constant in the summation. Now, applying the chain rule to $E(\mathbf w)$, we get $$\frac{\partial E}{\partial w_i}=\sum_{n=1}^N\{y(x_n, \mathbf w)-t_n\}\frac{\partial y}{\partial w_i}$$ but we know that $$y(x, \mathbf w)=\sum_{j=0}^Mw_jx^j$$ substituting our knowns, we get $$\frac{\partial E}{\partial w_i}=\sum_{n=1}^N\Biggl(\sum_{j=0}^Mw_jx^j_n-t_n\Biggl)x^i_n$$ which is the desired answer.

Badr B
  • 631
  • 5
  • 13