SVM algorithm for machine learning. Algebra in which it was constructed and description?

Question

As I understood this is a simple full description of SVM algorithm:

There are set of elements (mathematically points). These elements describe as ordered pairs of Cartesian product of two sets X and Y. The approach is to draw a line in the "plane" X x Y such that:

1.1. The points from different classes are on opposite sides of a given line

1.2. The parameters are chosen so direct that maximized minimum distance points to the line

If the line fails to separate original points in original coordinated.

Then let's made:

2.1 Bijective change of coordinates of points using a suitable nonlinear transformation in trying to find the line in the new coordinates

2.2 If we find such tranformation, then when inverse transformation to the original coordinates we can transform out line

2.3 If we still have problem to find such line, we can modify optimization criteria to allow perform some mistakes and weight this mistakes

QUESTIONS:

I have a bunch of question. I'll be glad in answer in any of them:

q1. Why guys from machine learning (some of them) use so complicated terminology like in Russian wiki about SVM?

I think that they are familiar with such concepts as cartesian product of two arbitary sets, and they familiar with 'line' concept? I really don't understand the reason of extra sophistication explanation.

q2. I used term 'line'. I do not understand why they use term 'hyperplane'. This word is 10/4 longer. And such term frightens for me.

q3. Why to guys from machine learning create extra definition of 'kernel trick'. They called step 2.1 with this name. In counting methods 'nonlinear change of coordinates' is called as 'nonlinear change of coordinates'. Why to introduce new term, which beside give wrong hint about convolution kernel?

q4 So I don't see any information in SVM from english wiki about algebra in which it was defined. It is step number "0" in any mathematical sub field to define objects with which you "natively" working.

Why guys from machine learning pass this step?

Empiromancer · Answer 1 · 2015-11-11T15:36:42.383

0

Q2. The term hyperplane is preferred in this sort of classification problem because one frequently wants to work with high dimensional data. When working in a $2$-dimensional space, "dividing the space in two" means finding a line ($1$-dimensional affine subspace) that separates your two groups of points. When working in a $d$ dimensional space, we want to look for a $(d-1)$ dimensional affine subspace, i.e. a plane or hyperplane for $d = 3$ or $d \geq 4$. A hyperplane is the most general way to refer to it in arbitrary $d$.

Q1, Q3. Historically, machine learning has developed from computer science (artificial intelligence, neural networks, etc.) It's only within the last 20 or so years that it has become heavily intertwined with statistics. Hence, a lot of the terminology of ML is different even from other fields of statistics. It's less "introducing unnecessary new terms" and more that terms in ML developed on their own, and by the time the field reached the level of mathematical/statistical sophistication it has today people in the field were uninterested in changing terminology to match that of mathematicians.

Q4. Once again, I think it helps to remember that this is a field that was started by engineers and computer scientists. Historically, they're working with different ideas of mathematical rigor - the first instinct of an engineer or a statistician is not to consider the algebra in which he or she is working.

Generally, I think it would be a mistake to consider ML a mathematical sub-field. There may be mathematical ways of formulating and understanding algorithsm from ML, but the people working on it are statisticians and computer scientists. Thus, many of the assumptions and defaults from mathematics don't carry over.

edited Nov 11 '15 at 15:36

answered Nov 11 '15 at 15:23

Empiromancer

895

q1 -- X x Y -- please take X as R^n and take Y as R^m. Please take X as L2 functions and Y as scalars...if you want....I disagree with you. It is just a line...To be honest I don not here such concept like "high dimensional" in math... It exist concept like finite "dimensional space" and "inifinite dimensional space..." – Konstantin Burlachenko Nov 11 '15 at 15:53
"It is just a line.." there is no difference, the aim is just not create extra terms. I do not understand why you mentioned "affine space" and more over I don't know fields, except geometry which work with such concept....And it least if you as for X and Y cartesian power of real numbers then for task setup and task solving you need not to use translate at all....I think you would like to say about Linear subspace. – Konstantin Burlachenko Nov 11 '15 at 16:07
q4 - It's cool but ML use terminology from linear algebra, and probility and finally you need to solve optimization problem via counting methods.The answer is what is x and y elements of which space. Can it be some functions like from L1, L2, Schwartz, Support functions or it should be elements of space with finite dimension? I can imagine line in X x Y space, or "hyperplane", but I'd like to know the generality....Short question -- is space in which the task is solving is linear (under the real number field) – Konstantin Burlachenko Nov 11 '15 at 16:13
Really all engineers which are involved in Natural Sciences should use math. Just check article of "The Unreasonable Effectiveness of Mat hematics in the Natural Sciences" by EUGENE P. WIGNER, 1959. But in any case you can imagine that this is not math field. But it will be better to be more precisely use derivations. But you should draw boundaries when method is applied. – Konstantin Burlachenko Nov 11 '15 at 16:18
p.s. In any case big thanks for answers, but I can not accept them, at least my mind can not accept them currently. – Konstantin Burlachenko Nov 11 '15 at 16:23
@bruziuz You've left me quite a few comments to respond to, but I'll see what I can do by way of answering them.
(1) "It's cool but..." In general, the sort of questions you see in ML are working in $\mathbb{R}^d$, or ${0,1}^d$, or something similar.

(2) "Really all engineers..." As someone working in statistics who is a mathematician by training, I agree that more formalism might be desirable for a rigorous understanding. But oftentimes the formalism is unnecessary and distracting when one considers the method a means to an end.
– Empiromancer Nov 11 '15 at 17:11
@bruziuz (3) "It is just a line...", It's true that if one restricts oneself to a two-dimensional space $X \times Y$ as you asked about, the separator will be a line. But you asked why the term "hyperplane" was used instead of line, and that's because the SVM algorithm is often used for a more generalized scenario with, for example, $X_1 \times X_2 \times \ldots \times X_n$. If we assume for simplicity $X_i = \mathbb{R}$ for all $i$, we see that a line does not separate points in $\mathbb{R}^3$ - rather, we want to find a plane that seperates points. – Empiromancer Nov 11 '15 at 17:19
@bruziuz "q1 X x Y..." In ML problems, you're almost always working with finite dimensional spaces (really almost always $\mathbb{R}^d$ or ${0,1}^d$ or their ilk), but how large $d$ is genuinely does matter. Larger $d$ means more difficult to visualize data, perhaps more computation time needed to work with data, and has some tangible statistical implications as well (which are difficult to explain in the comments). Thus, the ideal of a "high dimensional space" is one with practical important. – Empiromancer Nov 11 '15 at 17:25
Ok. Thanks. For inf. spaces it was shown that any power of this set is equal (bijective mapping exist) to original set. There is no problem to draw it at all. Of course illustration will be not as you want.) But in general you're right we can draw plot of 2d curves and 3d surfaces or 3d surfaces as level surfaces of 4d signal..... I don't have a deal in mathematical statistic to work with tuples of r.v. I worked with single random variable because mathematical statistic leverage on limit theorems which are formulated to scalar r.v. – Konstantin Burlachenko Nov 11 '15 at 20:22
We can not discuss it here, but if you have a link or name of theorem about smth. interesing about consider tuples instead of scalar r.v. -- please provide it. It will be cool, and I'll be glad to read it. – Konstantin Burlachenko Nov 11 '15 at 20:39

SVM algorithm for machine learning. Algebra in which it was constructed and description?

1 Answers1