-2

My question is how to derive we can use sigmoid function 1/(1+e^(-w*x)) to map w*x to probability space? what's the theory behind to prove sigmoid function is optimal method for mapping? Thanks for you help.

Ntydrm
  • 111
  • Wow, new contributor posts a question and gets anonymously downvoted more than 3 times, and without any comments from the downvoters. At times, it looks like math.SE is really full of sh*t. People should really relax! – dohmatob Apr 25 '20 at 08:19

1 Answers1

0

The purpose of sigmoid function is not to "map $w^*x$ to probability space", it is to simulate threshold behaviour for the activation of perceptrons. It is a nice function $\sigma : \mathbb{R}^{length(w)}\mapsto [0 , 1]^{length(w)}$.

But those occurrences of the interval $[0 , 1]$ aren't meaningful probability spaces, they are just activation range of some perceptrons.

Olivier Roche
  • 5,319
  • 9
  • 16