In linear classification, the margin is the distance that the closest point is to the separating hyperplane. It is useful because not all hyperplanes are equal, so taking the hyperplane with the largest margin is an intuitive way to select the best hyperplane. But I do not understand this equation:
$$ \epsilon = \max_{|W| \leq 1} \min_i {w^Tx_i \cdot y_i} $$
where $\epsilon$ is the margin, $w$ is a hyperplane in the set of possible hyperplanes $W$, $x_i$ is sample $i$, $y_i \in \{-1, 1\}$ is the label for sample $i$.
Can someone explain what the equation is doing? I get what we want to do conceptually, but do not understand this.
wis actually the separating hyperplane, right? (You say that "wis orthogonal to the hyperplane"). What does it mean thatw^T * x = 0? I assume this means the data is linearly separable, but isn'tw^T * xjust the sum of the distances between each pointx_iand the hyperplanew? – jds Oct 12 '16 at 13:15For this hyperplane, $$w= \begin{bmatrix} 2 \ 3 \end{bmatrix},$$ which is just a vector. If you plot the graph of $2x_1+3x_2=0$, and the vector direction $w=(2,3)$, we can see that they are orthogonal.
– Siong Thye Goh Oct 12 '16 at 15:33