0

The following is a lecture slide from a machine learning class:

Cross Entropy


For classification tasks, target $t$ is either $0$ or $1$, so better to use $$E=-t\log(z)-(1-t)\log(1-z)$$ This can be justified mathematically, and works well in practice -- especially when negative examples vastly outweigh positive ones. It also makes the backprop computations simpler $$\begin{align}\frac{\partial E}{\partial z}&=\frac{z-t}{z(1-z)}\\ \text{if}\qquad z&=\frac{1}{1+e^{-s}}\underset{\color{white}{\int}}{,}\\ \frac{\partial E}{\partial s}&=\frac{\partial E}{\partial z}\frac{\partial z}{\partial s}=z-t\end{align}$$

  1. My understanding was that error functions are functions of a single variable. If so, then, if I'm not mistaken, $E$ in the above slide is a function of $z$ only. And $z$ is a function of $s$ only. Therefore, by the chain rule, shouldn't we have $\dfrac{ dE }{ ds } = \dfrac{ dE }{ dz } \dfrac{ dz }{ ds }$? After all, if what I claimed is correct, then none of these functions are multivariable functions -- rather, they are compositions (nested) functions.

  2. But even if we assume that $E$ is a function of both $z$ and $t$, this still doesn't make sense. Why? Because $z$ is a function of one variable only -- it is a function of $s$. So we would have $\dfrac{ \partial{E} }{ \partial{s} } = \dfrac{ \partial{E} }{ \partial{z} } \dfrac{ dz }{ ds }$?

I would greatly appreciate it if people could please take the time to clarify this.

The Pointer
  • 4,182
  • 1
    People use partial derivative notation for functions of one variable sometimes. – mathworker21 Jul 08 '18 at 06:57
  • @mathworker21 Hmm, but isn't that technically incorrect? It implies that the function is multivariable, which causes confusion (case in point)? – The Pointer Jul 08 '18 at 06:58
  • @mathworker21 The answer (and comment) to this question seems to corroborate your claim: https://math.stackexchange.com/questions/916121/partial-derivative-of-the-one-variable-function – The Pointer Jul 08 '18 at 07:01
  • I don't think there's any technical definition of what it means to be multivariate. In certain situations, some things that are technically variables become viewed as constants. – mathworker21 Jul 08 '18 at 07:01
  • @mathworker21 Ok, I think you've cleared up my confusion. Thanks for the clarification. – The Pointer Jul 08 '18 at 07:01
  • 1
    I agree with you about its annoyance though. No problem – mathworker21 Jul 08 '18 at 07:02

0 Answers0