3

In the online free book the following is stated:

If $C$ is a cost function which depends on $v1,v2,...,vn$ he states that we make a move in the $Δv$ direction to decrease $C$ as much as possible, and that's equivalent to minimizing $ΔC≈∇C⋅Δv$. So if $∣∣Δv∣∣=ϵ$ for a small $ϵ$, it can be proved that the choice of $Δv$ that minimizes $ΔC≈∇C⋅Δv$ is $Δv=−η∇C$ where $η=ϵ/∣∣∇C∣∣$. It is suggested to use the Cauchy-Schwarz inequality.

I don't have a background in mathematics, I have done a lot of reading but I am struggling to know where to start. Even after a lot of reading, I have no conceptual understanding of why the Cauchy-Schwarz inequality is relevant here. Perhaps somebody can help me?

par
  • 131
  • Formatting tips here: http://meta.math.stackexchange.com/q/5020/321264 – StubbornAtom Aug 28 '16 at 18:21
  • I'm not sure you will really get this topic without background in mathematics – Yuriy S Aug 28 '16 at 18:25
  • @StubbornAtom thanks, fixed. – par Aug 28 '16 at 18:28
  • Note that if you moved in the direction $v$, the reduction in cost will be $v \cdot \nabla C$. Cauchy-Schwarz says that $|v \cdot \nabla C| \le ||v|| , ||\nabla C||$, with the maximum when $v = a \nabla C$ for some scalar $a$. This means that to get the maximum reduction, I should be directed either along or directly opposite to $\nabla C$. It's relatively easy to see that I should indeed be opposed to $\nabla C$ in order to get a reduction at all, and that finishes the story. – stochasticboy321 Aug 28 '16 at 18:29
  • That said, I'd second Yuriy in that understanding ML would require a fairly firm grounding in vector calc and linear algebra, and you'd likely be better off investing time in these first instead of jumping right in, especially if a proof like this is currently out of reach. – stochasticboy321 Aug 28 '16 at 18:32
  • @stochasticboy321 I understand the notation and of course I understand linear algebra. I guess the concept here is that the two sides of the C-S are be equal, and if that is the base it is a simple rearrangement to show that $η=ϵ/∣∣∇C∣∣η=ϵ/∣∣∇C∣∣$. But I don't understand why the C-S must be equal? – par Aug 28 '16 at 18:33
  • 1
    C-S gives you an upper bound on the absolute value of the reduction possible. If you reach this maximum (which you'd like to), you must then (tautologically) satisfy C-S with equality. – stochasticboy321 Aug 28 '16 at 18:36
  • Perfect, it just clicked thanks. I guess that should have been obvious :) – par Aug 28 '16 at 18:38

0 Answers0