As I understood this is a simple full description of SVM algorithm:
There are set of elements (mathematically points). These elements describe as ordered pairs of Cartesian product of two sets X and Y. The approach is to draw a line in the "plane" X x Y such that:
1.1. The points from different classes are on opposite sides of a given line
1.2. The parameters are chosen so direct that maximized minimum distance points to the line
If the line fails to separate original points in original coordinated.
Then let's made:
2.1 Bijective change of coordinates of points using a suitable nonlinear transformation in trying to find the line in the new coordinates
2.2 If we find such tranformation, then when inverse transformation to the original coordinates we can transform out line
2.3 If we still have problem to find such line, we can modify optimization criteria to allow perform some mistakes and weight this mistakes
QUESTIONS:
I have a bunch of question. I'll be glad in answer in any of them:
q1. Why guys from machine learning (some of them) use so complicated terminology like in Russian wiki about SVM?
I think that they are familiar with such concepts as cartesian product of two arbitary sets, and they familiar with 'line' concept? I really don't understand the reason of extra sophistication explanation.
q2. I used term 'line'. I do not understand why they use term 'hyperplane'. This word is 10/4 longer. And such term frightens for me.
q3. Why to guys from machine learning create extra definition of 'kernel trick'. They called step 2.1 with this name. In counting methods 'nonlinear change of coordinates' is called as 'nonlinear change of coordinates'. Why to introduce new term, which beside give wrong hint about convolution kernel?
q4 So I don't see any information in SVM from english wiki about algebra in which it was defined. It is step number "0" in any mathematical sub field to define objects with which you "natively" working.
Why guys from machine learning pass this step?
(1) "It's cool but..." In general, the sort of questions you see in ML are working in $\mathbb{R}^d$, or ${0,1}^d$, or something similar.
(2) "Really all engineers..." As someone working in statistics who is a mathematician by training, I agree that more formalism might be desirable for a rigorous understanding. But oftentimes the formalism is unnecessary and distracting when one considers the method a means to an end.
– Empiromancer Nov 11 '15 at 17:11