I am studying the Perceptron algorithm. Right now I am trying to understand if the initialization choice of $\theta$ effects the algorithm's ability to converge. I have tried experimenting by applying the algorithm to a linearly separable training set with different initial $\theta$ values, one being zero and the others non-zero values and the algorithm converged every time. However, I can't come up with a way of mathematically generalizing this pattern. I am trying to figure out how to generalize this pattern; come up with a proof for why the algorithm always converges.
-
hi, you mind find this helpful: https://en.wikipedia.org/wiki/Perceptron#Convergence also, this question does not provide explanation of what $\theta$ means (and if that includes an offset etc etc). It would be nice if you add that so that the question is self contained. – Charlie Parker Feb 15 '16 at 01:04
-
1you should ask more generally when does a gradient descent converge to the global minimum whatever is the starting point – reuns Feb 15 '16 at 01:09
-
did that help you? If it did resolve your question don't forget to upvote or accept. Otherwise, follow up my answer please. – Charlie Parker Feb 15 '16 at 02:55
-
@user1952009 what is the answer to that? – Charlie Parker Feb 15 '16 at 06:41
1 Answers
As long as the data set is linearly separable, the perceptron algorithm will always converge in $ \frac{R^2}{\gamma^2} $ iterations. The initialization does not matter.
The proof is a standard thing they explain in any ML course at university (not super trivial to come up with but simple to understand by reading the actual proof). See these notes for example:
http://www.cs.columbia.edu/~mcollins/courses/6998-2012/notes/perc.converge.pdf
I found it by googling "perceptron convergence proof", hope it helps.
In fact the beauty of the prove above is that it shows that if you initialize it to zero then you errors are bounded by $R^2/\gamma^2$. So in a way, it shows that if you start with the vector 0 and update, you eventually converge in a finite number of iterations.
If the initialization affects the speed of convergence, that depends on the data set and in what order you see ur points.
- 6,546