Understanding the equation for margin in linear classification

Question

In linear classification, the margin is the distance that the closest point is to the separating hyperplane. It is useful because not all hyperplanes are equal, so taking the hyperplane with the largest margin is an intuitive way to select the best hyperplane. But I do not understand this equation:

$$ \epsilon = \max_{|W| \leq 1} \min_i {w^Tx_i \cdot y_i} $$

where $\epsilon$ is the margin, $w$ is a hyperplane in the set of possible hyperplanes $W$, $x_i$ is sample $i$, $y_i \in \{-1, 1\}$ is the label for sample $i$.

Can someone explain what the equation is doing? I get what we want to do conceptually, but do not understand this.

score 4 · Accepted Answer · answered Oct 10 '16 at 21:36

We want $w^Tx_i$ (prediction) and $y_i$ (truth) to share the same sign, that is we desire that $$w^Tx_i y_i \geq 0$$

Suppose this condition can be satisfied, which hyperplane $w^Tx=0$ should we choose?

Suppose the hyperplane is given to us, $w^Tx=0$ and $\left\| w\right\|=1$, let's compute the distance of a point $x_iy_i$ from the hyperplane.

Note that $w$ is orthogonal to the hyperplane and $\left\| w\right\|=1$, hence the distance of the point $x_iy_i$ from the hyperplane can be compute as $w^Tx_iy_i$ (i.e. projecting to the direction that is orthogonal to the hyperplane)

Given a training data set, the smallest distance from the hyperplane is then

$$\min_i w^Tx_iy_i$$

We want to choose $w$ that maximizes this quantity, hence the problem becomes

$$\max_{\left\|w\right\|=1}\min_i w^Tx_iy_i$$

It is easy to see that

$$\max_{\left\|w\right\|=1}\min_i w^Tx_iy_i=\max_{\left\|w\right\| \leq 1}\min_i w^Tx_iy_i$$

This makes sense. Just to be clear, w is actually the separating hyperplane, right? (You say that "w is orthogonal to the hyperplane"). What does it mean that w^T * x = 0? I assume this means the data is linearly separable, but isn't w^T * x just the sum of the distances between each point x_i and the hyperplane w? — jds, Oct 12 '16 at 13:15
Hyperplane can be described by a linear equations, for example $2x_1+3x_2=0$.
For this hyperplane, $$w= \begin{bmatrix} 2 \ 3 \end{bmatrix},$$ which is just a vector. If you plot the graph of $2x_1+3x_2=0$, and the vector direction $w=(2,3)$, we can see that they are orthogonal. — Siong Thye Goh, Oct 12 '16 at 15:33

Understanding the equation for margin in linear classification

1 Answers1